HySpex Campaigns and Bulk Ingestion
Problem This Workflow Solves
Large HySpex parent collections can contain hundreds or thousands of children. Submitting one huge request is operationally brittle and makes restart or recovery difficult.
The project now supports deterministic, resumable batching through:
start_positioncampaign_idcampaign_total_children
Core Endpoints
| Route | Purpose |
|---|---|
POST /jobs/hyspex-csw-parent | queue one parent batch |
GET /jobs/hyspex-csw-campaigns/{campaign_id} | aggregate campaign status |
POST /datasets/from-hyspex-parent | synchronous or API-level bulk parent processing |
Sequential Batching Pattern
The safe pattern is:
- queue a parent batch with
start_position=1 - wait for the parent job to finish
- wait for all child jobs in that batch to finish
- read
next_start_position - queue the next batch with the same
campaign_id
That is exactly what both of these clients now do:
scripts/run_hyspex_parent_batches.py- the browser-side
Start Sequential Campaignaction in/ui/
Browser Campaign Runner
The frontend modal can now:
- generate or reuse a campaign ID
- submit sequential parent batches directly from the browser
- poll parent and child jobs until terminal
- stop after the current polling cycle when the operator clicks
Stop Campaign - keep the campaign monitor synchronized while the run is active
Aggregate Campaign Monitor
The jobs card in /ui/ includes a HySpex Campaign Monitor that summarizes:
- number of parent batches
- total child jobs
- number of unique children
- successful, failed, or still-running child groups
- latest
next_start_position
Practical Example
python3 scripts/run_hyspex_parent_batches.py \
--base-url http://127.0.0.1:8080 \
--token demo \
--parent-id no.met.adc:fd40898c-7e06-5337-b915-b99d0631e5fb \
--total-children 1000 \
--batch-size 40
Operational Guidance
- keep batches sequential, not concurrent
- use moderate batch sizes such as 25 to 50 for very large campaigns
- treat task-per-child worker recycling as the main safety control under sustained ingestion load
- use campaign IDs consistently so you can recover state from the aggregate endpoint at any time
Architecture Views
For the structural and machine-level diagrams behind this workflow, see C4 Architecture and Workflow Diagrams.