TL;DR — STAC Zarr Best Practices
- Use STAC Items for single scenes or time slices, and STAC Collections for datasets spanning multiple times/regions. Each Item or Collection may reference one Zarr store.
- One STAC asset = one Zarr group (not individual arrays). Arrays and subgroups live inside the asset’s Zarr hierarchy.
- Always link the Zarr store using
rel: store, pointing to the root of the (native or virtual) Zarr store. All Zarr assets are assumed to live under this store. - Use the correct Zarr media type with version:
application/vnd.zarr; version=2application/vnd.zarr; version=3- Add
profile=multiscaleswhen publishing multiscale Zarr assets. - Do not expose arrays as assets.
- Expose bands via the bands array:
- One variable = one band → name = variable name
- One variable, many bands → encode band selection in name
- Multiscales → bands are resolution-agnostic; resolution is inferred from the Zarr layout
- Asset href always points to a Zarr group, never directly to an array.
- Clients access arrays by path-joining asset.href + band.name.
- For multiresolution data:
- Either expose one asset per resolution, or
- A single multiscales asset pointing to the parent group (preferred when resolutions are tightly coupled)
- Use STAC extensions consistently:
- Datacube: describe variables and dimensions (cube:variables, cube:dimensions)
- Projection: spatial reference (proj:*)
- Raster: raster properties (resolution, nodata, dtype)
- CF: climate/forecast semantics (cf:standard_name, units, etc.)
- Virtual Zarr stores (Kerchunk, VirtualiZarr, icechunk):
- Treat them like native Zarr
rel: storepoints to the reference/entrypoint- Assets may carry role
"virtual" - Source files may be referenced separately with role "source"
- Link Templates MAY be used to advertise variable-level access without enumerating arrays as assets.
In short:
- STAC describes what is in the Zarr store, not how to traverse it.
- Zarr handles structure; STAC handles discovery, semantics, and access hints.
Current Implementation in This Repository
The stac-zarr tool implements the following conventions and metadata patterns for Zarr v3 outputs.
STAC-side implementation
rel: storelink on the output Collection pointing to<collection-id>.zarrmeasurementsSTAC asset pointing to<collection-id>.zarr/measurements- Zarr media type includes
profile=multiscales - Datacube metadata at asset level using:
cube:variablescube:dimensions- Projection metadata at asset level (
proj:*) - Raster metadata at asset level (
raster:bands) - STAC extension URIs pinned to:
- projection
v2.0.0 - raster
v2.0.0 - datacube
v2.2.0
Zarr-side implementation
Root group attributes include:
zarr_conventions- exactly one of projection representations:
proj:projjson(preferred)proj:wkt2(fallback)proj:code(fallback)spatial:dimensionsspatial:bboxspatial:shapespatial:transformmultiscales
Registered conventions in zarr_conventions:
proj:spatial:multiscales
Multiscales implementation
For each measurement:
- Base level written to
measurements/<measurement> - Overview levels written to
measurements_overviews/<measurement>/<level>/<measurement> - Root
multiscalesuses GeoZarr v1 layout metadata: resampling_methodlayout- Per-measurement dataset listing is exposed in
multiscales:datasets
Overview generation controls:
--overview-levels--continuous-overview-reducer--categorical-overview-reducer
Reducers supported:
meanmaxmediannearest
CF dataset semantics
For base and overview datasets, data arrays include:
grid_mapping = "spatial_ref"coordinates = "time y x"
Dataset groups include coordinate members:
time(CF-style numeric time, seconds since epoch)xyspatial_ref
The writer validates:
- coordinate references and dimension-shape consistency
grid_mappingreferences to existing dataset members
GeoZarr Minispec Compliance (Current)
Reference:
https://eopf-explorer.github.io/data-model/geozarr-minispec/
Implemented:
zarr_conventionsincludesproj:,spatial:,multiscalesspatial:*root attributes (dimensions,bbox,shape,transform)- GeoZarr v1
multiscalesobject (resampling_method,layout) - multiscale data levels with explicit dataset paths
- CF-style dataset members and
grid_mappinglinkage checks
Forward work items:
- extend reducer-to-resampling mapping if additional methods are introduced
Compliance Check Notes
When running task compliance:check:all:
compliance:check:multiscales-layoutis the authoritative strict check for the emitted GeoZarr v1 layout profile and is expected to pass.compliance:check:multiscalesvalidates against upstreamzarr-conventions/multiscalesschema and is expected to pass.
Measurement contract
The implementation is Collection-driven:
collection.item_assetsis mandatory- measurement keys are sourced from
collection.item_assets - each input Item must include all declared measurement keys
- no measurement inference from Item-only extra assets