How-to Guide: Creating Nested Workflows in CWL¶
This guide explains how to build and use nested workflows in CWL by leveraging the SubworkflowFeatureRequirement
.
The focus is on the workflow composition and the integration of subworkflows to create reusable components.
Objective¶
- Main Workflow: Accepts inputs and calls a subworkflow (
rgb-composite
) as a single step. - Subworkflow (
rgb-composite
): Performs a series of steps to process data and produce the desired output.
Key Blocks¶
SubworkflowFeatureRequirement
- The
SubworkflowFeatureRequirement
allows workflows to include other workflows as steps.
requirements:
SubworkflowFeatureRequirement: {}
- Subworkflow Step Definition
The main workflow calls the subworkflow
using:
step_rgb_composite:
in:
stac-item: stac-item
bands: bands
out:
- rgb-tif
run: "#rgb-composite"
run: "#rgb-composite"
: Links to thergb-composite
subworkflow.Inputs and Outputs: The subworkflow accepts inputs (
stac-item
,bands
) and produces an output (rgb-tif
).
Steps¶
- Define the Subworkflow
The rgb-composite
subworkflow performs the following:
- Fetch band-specific asset URLs using the
stac
tool. - Stack the asset TIFFs into a single file using the
rio_stack
tool. - Apply color correction to generate the RGB composite using the
rio_color
tool.
Subworkflow Definition (rgb-composite
)
class: Workflow
id: rgb-composite
requirements:
InlineJavascriptRequirement: {}
NetworkAccess:
networkAccess: true
ScatterFeatureRequirement: {}
inputs:
stac-item:
type: string
bands:
type: string[]
outputs:
rgb-tif:
outputSource: step_color/rgb
type: File
steps:
step_curl:
in:
stac_item: stac-item
common_band_name: bands
out:
- hrefs
run: "#stac"
scatter: common_band_name
scatterMethod: dotproduct
step_stack:
in:
tiffs:
source: step_curl/hrefs
out:
- stacked
run: "#rio_stack"
step_color:
in:
stacked:
source: step_stack/stacked
out:
- rgb
run: "#rio_color"
- Define the Main Workflow
The main workflow invokes the rgb-composite subworkflow:
class: Workflow
id: main
requirements:
SubworkflowFeatureRequirement: {}
InlineJavascriptRequirement: {}
NetworkAccess:
networkAccess: true
ScatterFeatureRequirement: {}
inputs:
stac-item:
type: string
bands:
type: string[]
default: ["red", "green", "blue"]
outputs:
rgb-tif:
outputSource: step_rgb_composite/rgb-tif
type: File
steps:
step_rgb_composite:
in:
stac-item: stac-item
bands: bands
out:
- rgb-tif
run: "#rgb-composite"
Requirements:
SubworkflowFeatureRequirement
: Enables the use of nested workflows.ScatterFeatureRequirement
: Allows processing multiple bands simultaneously.
Inputs:
stac-item
: URL to a STAC item.bands
: Array of band names (default: ["red", "green", "blue"]
).
Outputs:
rgb-tif
: The RGB composite file produced by the subworkflow.
- Run the Workflow
To execute the main workflow, use the following command:
cwltool nested-workflow.cwl \
--stac-item https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A
INFO /opt/hostedtoolcache/Python/3.12.8/x64/bin/cwltool 3.1.20241217163858
INFO Resolved '../cwl-workflows/nested-workflow.cwl' to 'file:///home/runner/work/how-to/how-to/cwl-workflows/nested-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step step_rgb_composite
INFO [step step_rgb_composite] start
INFO [workflow step_rgb_composite] start
INFO [workflow step_rgb_composite] starting step step_curl
INFO [step step_curl] start
INFO [job step_curl] /tmp/rxp6zp3z$ docker \
run \
-i \
--mount=type=bind,source=/tmp/rxp6zp3z,target=/BLoLJe \
--mount=type=bind,source=/tmp/66zqooik,target=/tmp \
--workdir=/BLoLJe \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/6a4pztsf/20250102120301-447436.cid \
--env=TMPDIR=/tmp \
--env=HOME=/BLoLJe \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/rxp6zp3z/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 40358 0 --:--:-- --:--:-- --:--:-- 40301
INFO [job step_curl] completed success
INFO [step step_curl] start
INFO [job step_curl_2] /tmp/l8nzw3c5$ docker \
run \
-i \
--mount=type=bind,source=/tmp/l8nzw3c5,target=/BLoLJe \
--mount=type=bind,source=/tmp/hn9jqmph,target=/tmp \
--workdir=/BLoLJe \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/cku3iwqr/20250102120302-458426.cid \
--env=TMPDIR=/tmp \
--env=HOME=/BLoLJe \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/l8nzw3c5/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 21704 0 --:--:-- --:--:-- --:--:-- 21700
100 10156 100 10156 0 0 21696 0 --:--:-- --:--:-- --:--:-- 21654
INFO [job step_curl_2] completed success
INFO [step step_curl] start
INFO [job step_curl_3] /tmp/aycqs7xm$ docker \
run \
-i \
--mount=type=bind,source=/tmp/aycqs7xm,target=/BLoLJe \
--mount=type=bind,source=/tmp/alwzy183,target=/tmp \
--workdir=/BLoLJe \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/oztsl6wy/20250102120303-470251.cid \
--env=TMPDIR=/tmp \
--env=HOME=/BLoLJe \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/aycqs7xm/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
77 10156 77 7781 0 0 46227 0 --:--:-- --:--:-- --:--:-- 46041
100 10156 100 10156 0 0 60267 0 --:--:-- --:--:-- --:--:-- 60094
INFO [job step_curl_3] completed success
INFO [step step_curl] completed success
INFO [workflow step_rgb_composite] starting step step_stack
INFO [step step_stack] start
INFO [job step_stack] /tmp/cu8pn1ro$ docker \
run \
-i \
--mount=type=bind,source=/tmp/cu8pn1ro,target=/BLoLJe \
--mount=type=bind,source=/tmp/56t9u_62,target=/tmp \
--workdir=/BLoLJe \
--read-only=true \
--user=1001:128 \
--rm \
--cidfile=/tmp/cmduzayv/20250102120304-496226.cid \
--env=TMPDIR=/tmp \
--env=HOME=/BLoLJe \
--env=CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif \
--env=GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES \
--env=GDAL_TIFF_INTERNAL_MASK=YES \
ghcr.io/eoap/how-to/rio:1.0.0 \
rio \
stack \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B04.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B03.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B02.tif \
stacked.tif
INFO [job step_stack] Max memory used: 1249MiB
INFO [job step_stack] completed success
INFO [step step_stack] completed success
INFO [workflow step_rgb_composite] starting step step_color
INFO [step step_color] start
INFO [job step_color] /tmp/13k1inai$ docker \
run \
-i \
--mount=type=bind,source=/tmp/13k1inai,target=/BLoLJe \
--mount=type=bind,source=/tmp/1z_100qi,target=/tmp \
--mount=type=bind,source=/tmp/cu8pn1ro/stacked.tif,target=/var/lib/cwl/stg8ec29fc9-3b21-47d5-bbb2-44b316e73627/stacked.tif,readonly \
--workdir=/BLoLJe \
--read-only=true \
--user=1001:128 \
--rm \
--cidfile=/tmp/iqz80yck/20250102120334-167997.cid \
--env=TMPDIR=/tmp \
--env=HOME=/BLoLJe \
ghcr.io/eoap/how-to/rio:1.0.0 \
rio \
color \
-j \
-1 \
--out-dtype \
uint8 \
/var/lib/cwl/stg8ec29fc9-3b21-47d5-bbb2-44b316e73627/stacked.tif \
rgb.tif \
'gamma 3 0.95, sigmoidal rgb 35 0.13'
INFO [job step_color] Max memory used: 730MiB
INFO [job step_color] completed success
INFO [step step_color] completed success
INFO [workflow step_rgb_composite] completed success
INFO [step step_rgb_composite] completed success
INFO [workflow ] completed success
INFO Final process status is success
- Expected Output
Intermediate Outputs:
- URLs of band-specific TIFFs (hrefs).
- Stacked TIFF file (stacked.tif).
Final Output:
- RGB composite TIFF file (rgb-tif).
{
"rgb-tif": {
"location": "file:///home/runner/work/how-to/how-to/docs/rgb.tif",
"basename": "rgb.tif",
"class": "File",
"checksum": "sha1$b89f97eadfd9cf3c35bd3a588f583374f97c68a0",
"size": 361747464,
"path": "/home/runner/work/how-to/how-to/docs/rgb.tif"
}
}
Key Takeaways¶
Modularity with Subworkflows:
- Use
SubworkflowFeatureRequirement
to encapsulate reusable workflows. - Subworkflows simplify complex workflows by isolating specific logic.
Integration of Subworkflows:
- Define subworkflow steps in the main workflow.
- Use run to link the subworkflow.
Reusability:
- Subworkflows can be reused in multiple workflows, promoting modularity and efficiency.
This approach makes it easy to manage and scale CWL workflows by leveraging nested subworkflows.