Skip to main content

HySpex Campaigns and Bulk Ingestion

Problem This Workflow Solves

Large HySpex parent collections can contain hundreds or thousands of children. Submitting one huge request is operationally brittle and makes restart or recovery difficult.

The project now supports deterministic, resumable batching through:

  • start_position
  • campaign_id
  • campaign_total_children

Core Endpoints

RoutePurpose
POST /jobs/hyspex-csw-parentqueue one parent batch
GET /jobs/hyspex-csw-campaigns/{campaign_id}aggregate campaign status
POST /datasets/from-hyspex-parentsynchronous or API-level bulk parent processing

Sequential Batching Pattern

The safe pattern is:

  1. queue a parent batch with start_position=1
  2. wait for the parent job to finish
  3. wait for all child jobs in that batch to finish
  4. read next_start_position
  5. queue the next batch with the same campaign_id

That is exactly what both of these clients now do:

  • scripts/run_hyspex_parent_batches.py
  • the browser-side Start Sequential Campaign action in /ui/

Browser Campaign Runner

The frontend modal can now:

  • generate or reuse a campaign ID
  • submit sequential parent batches directly from the browser
  • poll parent and child jobs until terminal
  • stop after the current polling cycle when the operator clicks Stop Campaign
  • keep the campaign monitor synchronized while the run is active

Aggregate Campaign Monitor

The jobs card in /ui/ includes a HySpex Campaign Monitor that summarizes:

  • number of parent batches
  • total child jobs
  • number of unique children
  • successful, failed, or still-running child groups
  • latest next_start_position

Practical Example

python3 scripts/run_hyspex_parent_batches.py \
--base-url http://127.0.0.1:8080 \
--token demo \
--parent-id no.met.adc:fd40898c-7e06-5337-b915-b99d0631e5fb \
--total-children 1000 \
--batch-size 40

Operational Guidance

  • keep batches sequential, not concurrent
  • use moderate batch sizes such as 25 to 50 for very large campaigns
  • treat task-per-child worker recycling as the main safety control under sustained ingestion load
  • use campaign IDs consistently so you can recover state from the aggregate endpoint at any time

Architecture Views

For the structural and machine-level diagrams behind this workflow, see C4 Architecture and Workflow Diagrams.