Skip to main content

Troubleshooting

Use the docs search for endpoint names, job types, env vars, and component names. Good search keys are campaign_id, wms_use_mapcache, CELERY_MAX_TASKS_PER_CHILD, seed-mapcache, and storage/cleanup-tmp.

For a service-by-service recovery sequence, see Failure and Recovery.

Stack Boots But Frontend or Docs Return 404

Check the actual runtime path first.

In this repository the current local runtime path is the swarm deployment driven by reload_stack.sh and local-stack.yml, not a plain compose-only path.

For docs specifically, verify that /guide/ is mounted and that the image build included the docs stage.

Swagger Works but Cached GetMap Fails

Check:

  • wms_use_mapcache
  • native grid availability in generated mapcache.xml
  • layer directory ownership under /var/sig/tiles

If the request grid is unsupported, the API should fall back to direct MapServer rendering.

Worker Memory Growth

Symptoms:

  • Celery RSS grows across multiple point-cloud or HySpex tasks
  • large bulk campaigns slow down or destabilize the host

Actions:

  • lower CELERY_MAX_TASKS_PER_CHILD
  • keep concurrency moderate
  • treat the memory-per-child limit as a soft trigger, not a hard boundary

HySpex Campaign Stops Midway

Use the aggregate endpoint:

curl http://127.0.0.1:8080/jobs/hyspex-csw-campaigns/<campaign_id>

Recover by reusing the same campaign_id and the reported next_start_position.

For the normal operator sequence, see Operator Workflows. For the full queue and request handoff, see Request Flows.

Persisted Config Ignores New Environment Changes

That is expected when /data/server_config.json contains the same key. Runtime configuration overrides environment defaults. Either update the persisted config through /config or remove the stale key from the persisted file.

UI Pagination or Refresh Feels Wrong

Known behavior in the current frontend:

  • the dataset table can become awkward after heavy interaction and newly inserted data
  • a full browser refresh is sometimes the fastest recovery path
  • jobs and campaign monitoring are the more reliable live-status surfaces for long-running work