Skip to main content

Datasets and Catalog Workflows

Supported Ingestion Modes

The service supports several ways to create or register datasets:

WorkflowRouteTypical use
Structured CSW searchPOST /datasets/csw-searchpage through catalogue results before choosing identifiers to ingest
Upload GeoTIFFPOST /datasetsmanual ingestion of a prepared raster
Register existing filePOST /datasets/from-stringregister a file that already exists on the server
Point-cloud config conversionPOST /datasets/from-point-cloud-configgenerate a raster from NetCDF or point-cloud backed configuration
HySpex CSW single recordPOST /datasets/hyspex-cswcreate a single HySpex dataset payload or job
HySpex parent bulkPOST /datasets/from-hyspex-parent or POST /jobs/hyspex-csw-parentingest many HySpex children through parent pagination

Dataset Registry Model

Each registered dataset is persisted in /data/datasets.json and represented in memory through the shared DATASETS state registry.

Important dataset attributes include:

  • dataset_id
  • source file path and optional source dataset URL
  • band count and metadata
  • CRS and extent
  • WMS and WCS endpoints
  • derived products such as terrain tiles or thumbnails

Common Read Operations

RoutePurpose
GET /datasetslist datasets with pagination, search, sort, and optional filters
POST /datasets/csw-searchsearch a CSW catalogue with structured filters and paginated results
GET /datasets/{dataset_id}inspect one dataset payload
GET /datasets/{dataset_id}/samplesample values at lon and lat
GET /datasets/{dataset_id}/extent.geojsondataset footprint

Catalog Search Pattern

Use POST /datasets/csw-search when you want the API to build a CSW FES filter for you instead of hand-assembling query XML.

  • Provide any subset of keywords, start_datetime, end_datetime, bbox, and anytext.
  • The current implementation combines supplied filters with logical AND.
  • Use limit and offset to page through large result sets without pulling the whole catalogue into one response.
  • Take the returned identifier values and pass them into Sentinel or HySpex registration endpoints when you are ready to ingest.

Frontend Behavior

The built-in UI exposes these dataset operations:

  • upload with progress tracking
  • search and filter datasets
  • inspect metadata and extents
  • generate thumbnails
  • trigger Mago terrain generation
  • seed MapCache for selected datasets
  • detect duplicates and missing HySpex variants

Deduplication and Variant Caveats

Operational notes from the current implementation:

  • duplicate detection is based on dataset identity and source conventions and can block repeat submissions unless duplicates are explicitly allowed
  • HySpex workflows track RGB and Z variants separately and can report missing companions
  • stale source files are pruned from the dataset registry at startup