Datasets and Catalog Workflows
Supported Ingestion Modes
The service supports several ways to create or register datasets:
| Workflow | Route | Typical use |
|---|---|---|
| Structured CSW search | POST /datasets/csw-search | page through catalogue results before choosing identifiers to ingest |
| Upload GeoTIFF | POST /datasets | manual ingestion of a prepared raster |
| Register existing file | POST /datasets/from-string | register a file that already exists on the server |
| Point-cloud config conversion | POST /datasets/from-point-cloud-config | generate a raster from NetCDF or point-cloud backed configuration |
| HySpex CSW single record | POST /datasets/hyspex-csw | create a single HySpex dataset payload or job |
| HySpex parent bulk | POST /datasets/from-hyspex-parent or POST /jobs/hyspex-csw-parent | ingest many HySpex children through parent pagination |
Dataset Registry Model
Each registered dataset is persisted in /data/datasets.json and represented in memory through the shared DATASETS state registry.
Important dataset attributes include:
dataset_id- source file path and optional source dataset URL
- band count and metadata
- CRS and extent
- WMS and WCS endpoints
- derived products such as terrain tiles or thumbnails
Common Read Operations
| Route | Purpose |
|---|---|
GET /datasets | list datasets with pagination, search, sort, and optional filters |
POST /datasets/csw-search | search a CSW catalogue with structured filters and paginated results |
GET /datasets/{dataset_id} | inspect one dataset payload |
GET /datasets/{dataset_id}/sample | sample values at lon and lat |
GET /datasets/{dataset_id}/extent.geojson | dataset footprint |
Catalog Search Pattern
Use POST /datasets/csw-search when you want the API to build a CSW FES filter for you instead of hand-assembling query XML.
- Provide any subset of
keywords,start_datetime,end_datetime,bbox, andanytext. - The current implementation combines supplied filters with logical
AND. - Use
limitandoffsetto page through large result sets without pulling the whole catalogue into one response. - Take the returned
identifiervalues and pass them into Sentinel or HySpex registration endpoints when you are ready to ingest.
Frontend Behavior
The built-in UI exposes these dataset operations:
- upload with progress tracking
- search and filter datasets
- inspect metadata and extents
- generate thumbnails
- trigger Mago terrain generation
- seed MapCache for selected datasets
- detect duplicates and missing HySpex variants
Deduplication and Variant Caveats
Operational notes from the current implementation:
- duplicate detection is based on dataset identity and source conventions and can block repeat submissions unless duplicates are explicitly allowed
- HySpex workflows track RGB and Z variants separately and can report missing companions
- stale source files are pruned from the dataset registry at startup