How-to Guide: Scattering on Input Parameters¶
This guide explains how to scatter workflow steps based on input parameters using ScatterFeatureRequirement and MultipleInputFeatureRequirement.
The focus is on creating workflows where multiple input parameters are processed in parallel.
Objective¶
Scatter the step_curl task to process multiple input bands (red, green, blue) in parallel and combine their results in subsequent steps.
Key Features¶
ScatterFeatureRequirement
Enables scattering, allowing parallel execution of workflow steps for array-like inputs.
requirements:
ScatterFeatureRequirement: {}
MultipleInputFeatureRequirement
Allows using multiple input fields in a workflow.
requirements:
MultipleInputFeatureRequirement: {}
- Scatter on
common_band_name
Scatters the step_curl task based on the common_band_name parameter array.
in:
stac_item: stac-item
common_band_name: [red-band, green-band, blue-band]
out:
- hrefs
run: "#stac"
scatter: common_band_name
scatterMethod: dotproduct
Steps¶
- Define the Workflow
The workflow uses default input values for the band names (red, green, blue) and processes them using scattering.
Workflow Inputs
The workflow accepts:
stac-item: URL of a STAC item.red-band: Band name for the red channel (default:"red").green-band: Band name for the green channel (default:"green").blue-band: Band name for the blue channel (default:"blue").
Workflow Outputs
- The final output is the RGB composite TIFF file (
rgb-tif).
Workflow Definition
cwlVersion: v1.2
$graph:
- class: Workflow
id: main
requirements:
InlineJavascriptRequirement: {}
NetworkAccess:
networkAccess: true
ScatterFeatureRequirement: {}
MultipleInputFeatureRequirement: {}
inputs:
stac-item:
type: string
red-band:
type: string
default: "red"
green-band:
type: string
default: "green"
blue-band:
type: string
default: "blue"
outputs:
rgb-tif:
outputSource: step_color/rgb
type: File
steps:
step_curl:
in:
stac_item: stac-item
common_band_name: [red-band, green-band, blue-band]
out:
- hrefs
run: "#stac"
scatter: common_band_name
scatterMethod: dotproduct
step_stack:
in:
tiffs:
source: step_curl/hrefs
out:
- stacked
run: "#rio_stack"
step_color:
in:
stacked:
source: step_stack/stacked
out:
- rgb
run: "#rio_color"
- class: CommandLineTool
id: stac
requirements:
DockerRequirement:
dockerPull: docker.io/curlimages/curl:latest
baseCommand: curl
stdout: message
arguments:
- $( inputs.stac_item )
inputs:
stac_item:
type: string
common_band_name:
type: string
outputs:
hrefs:
type: string
outputBinding:
glob: message
loadContents: true
outputEval: |
${
const assets = JSON.parse(self[0].contents).assets;
const bandKey = Object.keys(assets).find(key =>
assets[key]['eo:bands'] &&
assets[key]['eo:bands'].length === 1 &&
assets[key]['eo:bands'].some(band => band.common_name === inputs.common_band_name)
);
if (!bandKey) {
throw new Error(`No valid asset found for band: ${inputs.common_band_name}`);
}
return assets[bandKey].href;
}
- class: CommandLineTool
id: rio_stack
requirements:
DockerRequirement:
dockerPull: ghcr.io/eoap/how-to/rio:1.0.0
EnvVarRequirement:
envDef:
GDAL_TIFF_INTERNAL_MASK: YES
GDAL_HTTP_MERGE_CONSECUTIVE_RANGES: YES
CPL_VSIL_CURL_ALLOWED_EXTENSIONS: ".tif"
InitialWorkDirRequirement:
listing:
- entryname: run.sh
entry: |-
#!/bin/bash
rio stack $@
baseCommand: ["/bin/bash", "run.sh"]
arguments:
- valueFrom: "${ \n var arr = [];\n for(var i=0; i<inputs.tiffs.length; i++) {\n arr.push(inputs.tiffs[i]); \n }\n return arr; \n }\n"
- stacked.tif
inputs:
tiffs:
type: string[]
outputs:
stacked:
type: File
outputBinding:
glob: stacked.tif
- class: CommandLineTool
id: rio_color
requirements:
DockerRequirement:
dockerPull: ghcr.io/eoap/how-to/rio:1.0.0
InitialWorkDirRequirement:
listing:
- entryname: run.sh
entry: |-
#!/bin/bash
rio color -j -1 --out-dtype uint8 $1 rgb.tif "gamma 3 0.95, sigmoidal rgb 35 0.13"
baseCommand: ["/bin/bash", "run.sh"]
arguments:
- $( inputs.stacked.path )
inputs:
stacked:
type: File
outputs:
rgb:
type: File
outputBinding:
glob: rgb.tif
Its graphical representation:
- Scatter Configuration
- Step 1: Scatter on
common_band_name
The step_curl step scatters over the common_band_name parameter array ([red-band, green-band, blue-band]).
in:
stac_item: stac-item
common_band_name: [red-band, green-band, blue-band]
out:
- hrefs
run: "#stac"
scatter: common_band_name
scatterMethod: dotproduct
scatter: Scatters the step over the values ofcommon_band_name.scatterMethod: dotproduct: Matches corresponding elements in the array for parallel execution.
Step 2: Combine Results
The step_stack step combines the TIFF files fetched by step_curl.
in:
tiffs:
source: step_curl/hrefs
out:
- stacked
run: "#rio_stack"
- Substeps and Tools
stac Tool
- Fetches URLs of the TIFF files corresponding to each band.
rio_stack Tool
- Stacks the TIFF files into a single composite file.
- Run the Workflow
Execute the workflow with the default parameters:
cwltool scatter-input-parameters.cwl \
--stac-item https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A
INFO /opt/hostedtoolcache/Python/3.13.3/x64/bin/cwltool 3.1.20250110105449
INFO Resolved '../cwl-workflows/scatter-input-parameters.cwl' to 'file:///home/runner/work/how-to/how-to/cwl-workflows/scatter-input-parameters.cwl'
INFO [workflow ] start
INFO [workflow ] starting step step_curl
INFO [step step_curl] start
INFO [job step_curl] /tmp/pjim30c_$ docker \
run \
-i \
--mount=type=bind,source=/tmp/pjim30c_,target=/immwZn \
--mount=type=bind,source=/tmp/w076lvqq,target=/tmp \
--workdir=/immwZn \
--read-only=true \
--log-driver=none \
--user=1001:118 \
--rm \
--cidfile=/tmp/x8kvg4tx/20250620072043-011732.cid \
--env=TMPDIR=/tmp \
--env=HOME=/immwZn \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/pjim30c_/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 28588 0 --:--:-- --:--:-- --:--:-- 28528
100 10156 100 10156 0 0 28582 0 --:--:-- --:--:-- --:--:-- 28528
INFO [job step_curl] completed success
INFO [step step_curl] start
INFO [job step_curl_2] /tmp/9u3r00yd$ docker \
run \
-i \
--mount=type=bind,source=/tmp/9u3r00yd,target=/immwZn \
--mount=type=bind,source=/tmp/pmd8at18,target=/tmp \
--workdir=/immwZn \
--read-only=true \
--log-driver=none \
--user=1001:118 \
--rm \
--cidfile=/tmp/bj_7ok79/20250620072044-019069.cid \
--env=TMPDIR=/tmp \
--env=HOME=/immwZn \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/9u3r00yd/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 46062 0 --:--:-- --:--:-- --:--:-- 46163
INFO [job step_curl_2] completed success
INFO [step step_curl] start
INFO [job step_curl_3] /tmp/slsb0sya$ docker \
run \
-i \
--mount=type=bind,source=/tmp/slsb0sya,target=/immwZn \
--mount=type=bind,source=/tmp/xf1wb7_f,target=/tmp \
--workdir=/immwZn \
--read-only=true \
--log-driver=none \
--user=1001:118 \
--rm \
--cidfile=/tmp/qv_0qywn/20250620072045-026504.cid \
--env=TMPDIR=/tmp \
--env=HOME=/immwZn \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/slsb0sya/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 58413 0 --:--:-- --:--:-- --:--:-- 58705
INFO [job step_curl_3] completed success
INFO [step step_curl] completed success
INFO [workflow ] starting step step_stack
INFO [step step_stack] start
INFO [job step_stack] /tmp/eyszqcz_$ docker \
run \
-i \
--mount=type=bind,source=/tmp/eyszqcz_,target=/immwZn \
--mount=type=bind,source=/tmp/5rrkbtid,target=/tmp \
--workdir=/immwZn \
--read-only=true \
--user=1001:118 \
--rm \
--cidfile=/tmp/0xnzhw20/20250620072046-046081.cid \
--env=TMPDIR=/tmp \
--env=HOME=/immwZn \
--env=CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif \
--env=GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES \
--env=GDAL_TIFF_INTERNAL_MASK=YES \
ghcr.io/eoap/how-to/rio:1.0.0 \
/bin/bash \
run.sh \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B04.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B03.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B02.tif \
stacked.tif
INFO [job step_stack] Max memory used: 1249MiB
INFO [job step_stack] completed success
INFO [step step_stack] completed success
INFO [workflow ] starting step step_color
INFO [step step_color] start
INFO [job step_color] /tmp/d8xs8_i9$ docker \
run \
-i \
--mount=type=bind,source=/tmp/d8xs8_i9,target=/immwZn \
--mount=type=bind,source=/tmp/tn80e284,target=/tmp \
--mount=type=bind,source=/tmp/eyszqcz_/stacked.tif,target=/var/lib/cwl/stg8f936c6e-9afb-4620-b18a-09db34efa67d/stacked.tif,readonly \
--workdir=/immwZn \
--read-only=true \
--user=1001:118 \
--rm \
--cidfile=/tmp/yq40jdiv/20250620072104-328838.cid \
--env=TMPDIR=/tmp \
--env=HOME=/immwZn \
ghcr.io/eoap/how-to/rio:1.0.0 \
/bin/bash \
run.sh \
/var/lib/cwl/stg8f936c6e-9afb-4620-b18a-09db34efa67d/stacked.tif
INFO [job step_color] Max memory used: 746MiB
INFO [job step_color] completed success
INFO [step step_color] completed success
INFO [workflow ] completed success
INFO Final process status is success
- Expected Output
Intermediate Outputs:
hrefs: URLs of the TIFF files forred,green, andbluebands.stacked.tif: Composite TIFF file of all bands.
Final Output:
rgb-tif: RGB composite TIFF file.
{
"rgb-tif": {
"location": "file:///home/runner/work/how-to/how-to/docs/rgb.tif",
"basename": "rgb.tif",
"class": "File",
"checksum": "sha1$87fea93a525287654a6e23e5b031fdb64e379094",
"size": 361747464,
"path": "/home/runner/work/how-to/how-to/docs/rgb.tif"
}
}
Key Takeaways¶
Scattering with Input Parameters:
- The
scatterfield enables parallel execution over array inputs. scatterMethod: dotproductensures corresponding elements in arrays are processed together.
Multiple Input Fields:
MultipleInputFeatureRequirementallows combining multiple input parameters in a single scatter operation.
Parallel and Modular Design:
- Scattering simplifies workflows by enabling parallel processing of input parameters.
This guide demonstrates how to use scattering to process multiple input parameters in parallel within a CWL workflow.