Execution Guide
This guide details how to execute the processing module, leveraging Dask for parallel processing in a distributed cluster environment and Calrissian for CWL workflow orchestration.
Distributed Cluster Setup
To enable parallel processing with Dask, a Dask Gateway object is required to create a Dask Cluster.
Prerequisites
-
Dask Gateway Deployment: It is assumed that a Dask Gateway is already deployed and accessible. The dev-platform-eoap repository provides a streamlined method for deploying Dask Gateway on Kubernetes, which can then be integrated with a Code-Server for module execution.
-
Dask Worker Image: For Dask to utilize your module's dependencies and code, a custom Docker image for the Dask workers must be built and made accessible to the Kubernetes cluster. This image should contain all the necessary Python packages and your module's code.
-
Building and Providing the Worker Image: The easiest way to build the worker image for this module is to navigate into the
eopf-sentinel-2/
directory (where the Dockerfile for the worker is located) and run a Docker build command:Once built, this image needs to be available to your Kubernetes cluster.cd eopf-sentinel-2/ docker build -t eopf-sentinel-2-worker:latest .
-
Using ttl.sh for Temporary Images: For development or testing, ttl.sh provides an anonymous, ephemeral registry. This allows you to quickly build and push a temporary, pullable image without authentication, which expires after a set duration (e.g., 1 hour).
You would then use this ttl.sh image name in your Dask Gateway configuration or Calrissian workflow.IMAGE_NAME=eopf-sentinel-2-worker docker build -t ttl.sh/${IMAGE_NAME}:1h . docker push ttl.sh/${IMAGE_NAME}:1h echo "Temporary image available at: ttl.sh/${IMAGE_NAME}:1h"
Once the Dask Gateway is available, you can proceed with running the processing module:
-
Direct Execution: The provided Jupyter Notebook demonstrates how to execute the module directly using its main function.
-
Command-Line Interface (CLI): Alternatively, the module can be executed via its CLI interface, offering a more traditional command-line approach.
Calrissian CWL Execution
This module includes a Common Workflow Language (CWL) workflow, optimized for parallel processing on a Dask Cluster. A new CWL extension, DaskGatewayRequirements
, has been developed to enable CWL runners like Calrissian to leverage Dask's distributed computing capabilities.
Environment Setup:
Ensure the environment described in the Distributed Cluster Setup section is established.
Execution Methods:
The workflow file can be executed using Calrissian in two primary ways:
-
Direct Calrissian Command
Execute the workflow directly using the calrissian command:
calrissian \
--stdout \
/calrissian/results.json \
--stderr \
/calrissian/app.log \
--max-ram \
16G \
--max-cores \
"8" \
--tmp-outdir-prefix \
/calrissian/tmp/ \
--outdir \
/calrissian/results \
--usage-report \
/calrissian/usage.json \
--tool-logs-basepath \
/calrissian/logs \
--pod-serviceaccount \
calrissian-sa \
--dask-gateway-url \
"http://traefik-dask-gateway.eoap-dask-gateway.svc.cluster.local:80" \
https://github.com/eoap/dask-app-package/releases/download/1.0.1/eopf-sentinel-2.1.0.1.cwl \
--input-url "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l1c/items/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458"
For robust and scalable execution, you can submit the workflow as a Kubernetes Job:
---
apiVersion: batch/v1
kind: Job
metadata:
name: calrissian-sentinel2
spec:
ttlSecondsAfterFinished: 60
template:
spec:
serviceAccountName: default
securityContext:
runAsUser: 0
runAsGroup: 0
containers:
- name: calrissian
image: calrissian:0.19.0
command: ["calrissian"]
args:
- --debug
- --stdout
- /calrissian/results.json
- --stderr
- /calrissian/app.log
- --max-ram
- 16G
- --max-cores
- "8"
- --tmp-outdir-prefix
- /calrissian/tmp/
- --outdir
- /calrissian/results
- --usage-report
- /calrissian/usage.json
- --tool-logs-basepath
- /calrissian/logs
- --dask-gateway-url
- "http://traefik-dask-gateway.eoap-dask-gateway.svc.cluster.local:80"
- https://github.com/eoap/dask-app-package/releases/download/1.0.1/eopf-sentinel-2.1.0.1.cwl
- --item-url
- "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l1c/items/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458"
volumeMounts:
- name: calrissian-volume
mountPath: /calrissian
env:
- name: CALRISSIAN_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
restartPolicy: Never
volumes:
- name: calrissian-volume
persistentVolumeClaim:
claimName: calrissian-claim
backoffLimit: 3
The latest version can be found here
Input Data
The pipeline is designed to process Sentinel-2 STAC items where the assets are in Zarr format.
Example Input URL:
https://stac.core.eopf.eodc.eu/collections/sentinel-2-l1c/items/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458
https://zarr.eopf.copernicus.eu/
.
Output Data
Upon successful execution, the pipeline generates the following outputs:
-
Common Vegetation and Water Indices (Zarr format):
-
ndvi
: Normalized Difference Vegetation Index (NIR, Red) -
ndwi
: Normalized Difference Water Index (Green, NIR) -
ndmir
: Normalized Difference Moisture Index (SWIR16, SWIR22) -
nbr
: Normalized Burn Ratio (NIR, SWIR22) -
ndwi2
: (NIR, SWIR16) -
mndwi
: Modified Normalized Difference Water Index (Green, SWIR16) -
ndbi
: Normalized Difference Built-up Index (SWIR16, NIR)
-
-
Various Color-Corrected Composite Images (Zarr format):
-
rgb
: Natural Color Composite (Red, Green, Blue) -
vea
: Vegetation Analysis Composite (SWIR16, SWIR22, Red) -
civ
: Color Infrared Vegetation Composite (NIR, Red, Green) -
law
: Land/Water Composite (NIR, SWIR16, Red) -
sir
: Shortwave Infrared Composite (SWIR22, NIR, Red) -
fcu
: False Color Urban Composite (SWIR22, SWIR16, Red) -
atp
: Atmospheric Penetration Composite (SWIR22, SWIR16, NIR)
-
-
Newly Created STAC Item: A new STAC item referencing the newly generated products and their assets.