Definitions
The Best Practice for Earth Observation Application Package addresses data flow management of the input and output EO Products files by defining rules for the data stage-in and data stage-out for Applications that require staged files and/or generate files that need to be staged-out.
Data stage-in definition
Data stage-in is the process to retrieve the inputs and make these available for the processing. Processing inputs are provided as catalogue references and the Platform is responsible for translating those references into inputs available as files for the local processing.
Data stage-out definition
Data stage-out is the process to upload the output files generated by the processing onto external system(s), and make them available for later usage. The Platform retrieves the processing outputs and automatically stores them onto an external persistent storage. Additionally, the Platform should publish the metadata of the outputs onto a Catalogue and provide their references as an output.
Application Data Flow Management
The Application data flow management relies on the rules:
-
The computational workflow data interfaces use the Spatio Temporal Asset Catalog (STAC) to describe the EO data inputs and generated results
-
Stage-in
- All input parameters of the CWL
ComandLineTool
that require the staging of EO products shall be of typeDirectory
. - All input parameters of the CWL
Workflow
that require the staging of EO products shall be of typeDirectory
. - Applications find a STAC
catalog.json
file
- All input parameters of the CWL
-
Stage-out
- Applications produce a STAC
catalog.json
in all outputs of type Directory - The outputs field of the
Workflow
that requires the stage-out of the generated products shall be of typeDirectory
.
- Applications produce a STAC
Platform Data Flow Management
A Platform is responsible for the data flow management by using a local catalogue encoded using the SpatioTemporal Asset Catalog (STAC) specification as a data manifest for application inputs and outputs.
The local catalogue provides knowledge about the input and output files data contents like spatial footprint, sub-items (e.g. masks, bands) and additional metadata.
Wrapping the Application Package
Wrap an Application Package: * plug a stage-in step for all workflow inputs of type Directory * plug a stage-out step for all workflow outputs of type Directory
The outcome is a wrapped CWL workflow that takes: * the application package parameters * any stage-in/stage-out parameters the platform may need to perform these operations