How-to Guide: Scattering Workflows¶
This guide demonstrates how to use scattering in CWL workflows, allowing parallel execution of tasks on multiple inputs.
The focus is on the ScatterFeatureRequirement
and scatterMethod
.
Objective¶
- Execute a subworkflow (rgb-composite) in parallel for multiple stac-items.
- Scatter the input stac-items to create multiple RGB composite outputs.
Key Concepts¶
ScatterFeatureRequirement
The ScatterFeatureRequirement
enables scattering, allowing a step to process multiple inputs in parallel.
requirements:
ScatterFeatureRequirement: {}
This makes it possible to process an array of inputs independently, where each item is executed as a separate job.
Scatter
andScatterMethod
scatter
: Specifies the input field(s) to be scattered.scatterMethod
: Defines how the inputs are combined when multiple fields are scattered:dotproduct
: Matches corresponding elements of input arrays (e.g., the first element of each array is processed together).nested_crossproduct
: Creates a Cartesian product of inputs, producing combinations of all elements.flat_crossproduct
: Similar tonested_crossproduct
but flattens the structure of results.
step_rgb_composite:
in:
stac-item: stac-items
bands: bands
out:
- rgb-tif
run: "#rgb-composite"
scatter: stac-item
scatterMethod: dotproduct
scatter: stac-item
: Thestac-item
array is scattered, creating one subworkflow execution per item.scatterMethod: dotproduct
: Matches eachstac-item
with the same bands input for each subworkflow execution.
Steps¶
- Define the Workflow
The main workflow (scatter-workflows.cwl
) processes multiple stac-items in parallel:
Workflow Definition:
class: Workflow
id: main
requirements:
SubworkflowFeatureRequirement: {}
InlineJavascriptRequirement: {}
NetworkAccess:
networkAccess: true
ScatterFeatureRequirement: {}
inputs:
stac-items:
type: string[]
bands:
type: string[]
default: ["red", "green", "blue"]
outputs:
rgb-tif:
outputSource: step_rgb_composite/rgb-tif
type: File[]
steps:
step_rgb_composite:
in:
stac-item: stac-items
bands: bands
out:
- rgb-tif
run: "#rgb-composite"
scatter: stac-item
scatterMethod: dotproduct
Its graphical representation:
Inputs:
stac-items
: An array of URLs to STAC items.bands
: Array of band names (default: ["red", "green", "blue"]
).
Output:
rgb-tif
: An array of RGB TIFF files (one perstac-item
).
Steps:
step_rgb_composite
: Calls the rgb-composite subworkflow with scattered stac-items.
- Define the Subworkflow
The subworkflow (rgb-composite
) processes a single stac-item to create an RGB composite:
Subworkflow Definition
class: Workflow
id: rgb-composite
requirements:
InlineJavascriptRequirement: {}
NetworkAccess:
networkAccess: true
ScatterFeatureRequirement: {}
inputs:
stac-item:
type: string
bands:
type: string[]
outputs:
rgb-tif:
outputSource: step_color/rgb
type: File
steps:
step_curl:
in:
stac_item: stac-item
common_band_name: bands
out:
- hrefs
run: "#stac"
scatter: common_band_name
scatterMethod: dotproduct
step_stack:
in:
tiffs:
source: step_curl/hrefs
out:
- stacked
run: "#rio_stack"
step_color:
in:
stacked:
source: step_stack/stacked
out:
- rgb
run: "#rio_color"
Its graphical representation:
Scatter in Subworkflow:
step_curl
scatters bands (e.g.,red
,green
,blue
).- Each band is processed independently to retrieve its corresponding asset.
- Run the Workflow
Execute the main workflow with multiple stac-items:
cwltool scatter-workflows.cwl \
--stac-item https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A \
--stac-item https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2A_53HPA_20210728_0_L2A
INFO /opt/hostedtoolcache/Python/3.12.8/x64/bin/cwltool 3.1.20241217163858
INFO Resolved '../cwl-workflows/scatter-workflows.cwl' to 'file:///home/runner/work/how-to/how-to/cwl-workflows/scatter-workflows.cwl'
INFO [workflow ] start
INFO [workflow ] starting step step_rgb_composite
INFO [step step_rgb_composite] start
INFO [workflow step_rgb_composite] start
INFO [workflow step_rgb_composite] starting step step_curl
INFO [step step_curl] start
INFO [job step_curl] /tmp/pcr0pe9p$ docker \
run \
-i \
--mount=type=bind,source=/tmp/pcr0pe9p,target=/mzcLUT \
--mount=type=bind,source=/tmp/hx7blntq,target=/tmp \
--workdir=/mzcLUT \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/3puo_zdp/20250102120642-755230.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/pcr0pe9p/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 35965 0 --:--:-- --:--:-- --:--:-- 36014
INFO [job step_curl] completed success
INFO [step step_curl] start
INFO [job step_curl_2] /tmp/ejn4wf_w$ docker \
run \
-i \
--mount=type=bind,source=/tmp/ejn4wf_w,target=/mzcLUT \
--mount=type=bind,source=/tmp/xf7gy5vz,target=/tmp \
--workdir=/mzcLUT \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/ugde7a3m/20250102120643-766089.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/ejn4wf_w/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 33543 0 --:--:-- --:--:-- --:--:-- 33629
INFO [job step_curl_2] completed success
INFO [step step_curl] start
INFO [job step_curl_3] /tmp/ncb68w_r$ docker \
run \
-i \
--mount=type=bind,source=/tmp/ncb68w_r,target=/mzcLUT \
--mount=type=bind,source=/tmp/kb4ml611,target=/tmp \
--workdir=/mzcLUT \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/cn5yraam/20250102120644-777095.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/ncb68w_r/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 51201 0 --:--:-- --:--:-- --:--:-- 51292
INFO [job step_curl_3] completed success
INFO [step step_curl] completed success
INFO [workflow step_rgb_composite] starting step step_stack
INFO [step step_stack] start
INFO [job step_stack] /tmp/loptcd_q$ docker \
run \
-i \
--mount=type=bind,source=/tmp/loptcd_q,target=/mzcLUT \
--mount=type=bind,source=/tmp/mrtcihrx,target=/tmp \
--workdir=/mzcLUT \
--read-only=true \
--user=1001:128 \
--rm \
--cidfile=/tmp/8e4pdddv/20250102120645-801568.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
--env=CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif \
--env=GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES \
--env=GDAL_TIFF_INTERNAL_MASK=YES \
ghcr.io/eoap/how-to/rio:1.0.0 \
rio \
stack \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B04.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B03.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B02.tif \
stacked.tif
INFO [job step_stack] Max memory used: 1243MiB
INFO [job step_stack] completed success
INFO [step step_stack] completed success
INFO [workflow step_rgb_composite] starting step step_color
INFO [step step_color] start
INFO [job step_color] /tmp/h84o_gln$ docker \
run \
-i \
--mount=type=bind,source=/tmp/h84o_gln,target=/mzcLUT \
--mount=type=bind,source=/tmp/77zifmj1,target=/tmp \
--mount=type=bind,source=/tmp/loptcd_q/stacked.tif,target=/var/lib/cwl/stg45728c5b-19b5-4049-bb91-db08a1654c3d/stacked.tif,readonly \
--workdir=/mzcLUT \
--read-only=true \
--user=1001:128 \
--rm \
--cidfile=/tmp/a27dy0g6/20250102120712-654312.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
ghcr.io/eoap/how-to/rio:1.0.0 \
rio \
color \
-j \
-1 \
--out-dtype \
uint8 \
/var/lib/cwl/stg45728c5b-19b5-4049-bb91-db08a1654c3d/stacked.tif \
rgb.tif \
'gamma 3 0.95, sigmoidal rgb 35 0.13'
INFO [job step_color] Max memory used: 722MiB
INFO [job step_color] completed success
INFO [step step_color] completed success
INFO [workflow step_rgb_composite] completed success
INFO [step step_rgb_composite] start
INFO [workflow step_rgb_composite_2] start
INFO [workflow step_rgb_composite_2] starting step step_curl_2
INFO [step step_curl_2] start
INFO [job step_curl_4] /tmp/6ldnbyhj$ docker \
run \
-i \
--mount=type=bind,source=/tmp/6ldnbyhj,target=/mzcLUT \
--mount=type=bind,source=/tmp/ad5kdw6v,target=/tmp \
--workdir=/mzcLUT \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/uj5e3n3z/20250102120719-075149.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2A_53HPA_20210728_0_L2A > /tmp/6ldnbyhj/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10157 100 10157 0 0 60416 0 --:--:-- --:--:-- --:--:-- 60458
INFO [job step_curl_4] completed success
INFO [step step_curl_2] start
INFO [job step_curl_5] /tmp/nuhuoamv$ docker \
run \
-i \
--mount=type=bind,source=/tmp/nuhuoamv,target=/mzcLUT \
--mount=type=bind,source=/tmp/ap5dz9jk,target=/tmp \
--workdir=/mzcLUT \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/b_c4oyta/20250102120720-085926.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2A_53HPA_20210728_0_L2A > /tmp/nuhuoamv/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10157 100 10157 0 0 57578 0 --:--:-- --:--:-- --:--:-- 57710
INFO [job step_curl_5] completed success
INFO [step step_curl_2] start
INFO [job step_curl_6] /tmp/lau3wh1c$ docker \
run \
-i \
--mount=type=bind,source=/tmp/lau3wh1c,target=/mzcLUT \
--mount=type=bind,source=/tmp/t602x_ho,target=/tmp \
--workdir=/mzcLUT \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/lr39jf1g/20250102120721-096231.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2A_53HPA_20210728_0_L2A > /tmp/lau3wh1c/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10157 100 10157 0 0 65452 0 --:--:-- --:--:-- --:--:-- 65529
INFO [job step_curl_6] completed success
INFO [step step_curl_2] completed success
INFO [workflow step_rgb_composite_2] starting step step_stack_2
INFO [step step_stack_2] start
INFO [job step_stack_2] /tmp/0o4l8rbf$ docker \
run \
-i \
--mount=type=bind,source=/tmp/0o4l8rbf,target=/mzcLUT \
--mount=type=bind,source=/tmp/x7jygl3d,target=/tmp \
--workdir=/mzcLUT \
--read-only=true \
--user=1001:128 \
--rm \
--cidfile=/tmp/jiwc20s3/20250102120722-108083.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
--env=CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif \
--env=GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES \
--env=GDAL_TIFF_INTERNAL_MASK=YES \
ghcr.io/eoap/how-to/rio:1.0.0 \
rio \
stack \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2A_53HPA_20210728_0_L2A/B04.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2A_53HPA_20210728_0_L2A/B03.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2A_53HPA_20210728_0_L2A/B02.tif \
stacked.tif
INFO [job step_stack_2] Max memory used: 1270MiB
INFO [job step_stack_2] completed success
INFO [step step_stack_2] completed success
INFO [workflow step_rgb_composite_2] starting step step_color_2
INFO [step step_color_2] start
INFO [job step_color_2] /tmp/om2uygsu$ docker \
run \
-i \
--mount=type=bind,source=/tmp/om2uygsu,target=/mzcLUT \
--mount=type=bind,source=/tmp/vm0cko6t,target=/tmp \
--mount=type=bind,source=/tmp/0o4l8rbf/stacked.tif,target=/var/lib/cwl/stg6315db6f-e1db-488e-b4a2-715140bfc6d5/stacked.tif,readonly \
--workdir=/mzcLUT \
--read-only=true \
--user=1001:128 \
--rm \
--cidfile=/tmp/iyun0vqa/20250102120806-499772.cid \
--env=TMPDIR=/tmp \
--env=HOME=/mzcLUT \
ghcr.io/eoap/how-to/rio:1.0.0 \
rio \
color \
-j \
-1 \
--out-dtype \
uint8 \
/var/lib/cwl/stg6315db6f-e1db-488e-b4a2-715140bfc6d5/stacked.tif \
rgb.tif \
'gamma 3 0.95, sigmoidal rgb 35 0.13'
INFO [job step_color_2] Max memory used: 714MiB
INFO [job step_color_2] completed success
INFO [step step_color_2] completed success
INFO [workflow step_rgb_composite_2] completed success
INFO [step step_rgb_composite] completed success
INFO [workflow ] completed success
INFO Final process status is success
- Expected Output
The workflow creates an array of RGB composite TIFF files, one for each stac-item:
Output Files:
- rgb-tif[0]: RGB composite for the first stac-item.
- rgb-tif[1]: RGB composite for the second stac-item.
{
"rgb-tif": [
{
"location": "file:///home/runner/work/how-to/how-to/docs/rgb.tif",
"basename": "rgb.tif",
"class": "File",
"checksum": "sha1$a1b1489c052e7154a3e9f1b4263fe5e2433e706e",
"size": 361747464,
"path": "/home/runner/work/how-to/how-to/docs/rgb.tif"
},
{
"location": "file:///home/runner/work/how-to/how-to/docs/rgb.tif_2",
"basename": "rgb.tif",
"class": "File",
"checksum": "sha1$9583b974ef960a423d55c0adfeb93d4baacf83c6",
"size": 361747464,
"path": "/home/runner/work/how-to/how-to/docs/rgb.tif_2"
}
]
}
Key Takeaways¶
ScatterFeatureRequirement
:
- Enables parallel processing of array inputs.
- Scatter and ScatterMethod:
scatter
: Specifies the input to scatter.scatterMethod
:dotproduct
: Matches corresponding elements in input arrays.nested_crossproduct
: Generates all combinations of input elements.
3 .Scalable Workflows:
- Scattering simplifies large-scale processing by distributing tasks over multiple inputs, improving performance and modularity.
This guide demonstrates how scattering in CWL can streamline workflows by enabling parallel execution of steps on array inputs.