How-to Guide: Scattering on Input Parameters¶
This guide explains how to scatter workflow steps based on input parameters using ScatterFeatureRequirement
and MultipleInputFeatureRequirement
.
The focus is on creating workflows where multiple input parameters are processed in parallel.
Objective¶
Scatter the step_curl
task to process multiple input bands (red
, green
, blue
) in parallel and combine their results in subsequent steps.
Key Features¶
ScatterFeatureRequirement
Enables scattering, allowing parallel execution of workflow steps for array-like inputs.
requirements:
ScatterFeatureRequirement: {}
MultipleInputFeatureRequirement
Allows using multiple input fields in a workflow.
requirements:
MultipleInputFeatureRequirement: {}
- Scatter on
common_band_name
Scatters the step_curl
task based on the common_band_name
parameter array.
in:
stac_item: stac-item
common_band_name: [red-band, green-band, blue-band]
out:
- hrefs
run: "#stac"
scatter: common_band_name
scatterMethod: dotproduct
Steps¶
- Define the Workflow
The workflow uses default input values for the band names (red
, green
, blue
) and processes them using scattering.
Workflow Inputs
The workflow accepts:
stac-item
: URL of a STAC item.red-band
: Band name for the red channel (default:"red"
).green-band
: Band name for the green channel (default:"green"
).blue-band
: Band name for the blue channel (default:"blue"
).
Workflow Outputs
- The final output is the RGB composite TIFF file (
rgb-tif
).
Workflow Definition
cwlVersion: v1.2
$graph:
- class: Workflow
id: main
requirements:
InlineJavascriptRequirement: {}
NetworkAccess:
networkAccess: true
ScatterFeatureRequirement: {}
MultipleInputFeatureRequirement: {}
inputs:
stac-item:
type: string
red-band:
type: string
default: "red"
green-band:
type: string
default: "green"
blue-band:
type: string
default: "blue"
outputs:
rgb-tif:
outputSource: step_color/rgb
type: File
steps:
step_curl:
in:
stac_item: stac-item
common_band_name: [red-band, green-band, blue-band]
out:
- hrefs
run: "#stac"
scatter: common_band_name
scatterMethod: dotproduct
step_stack:
in:
tiffs:
source: step_curl/hrefs
out:
- stacked
run: "#rio_stack"
step_color:
in:
stacked:
source: step_stack/stacked
out:
- rgb
run: "#rio_color"
- class: CommandLineTool
id: stac
requirements:
DockerRequirement:
dockerPull: docker.io/curlimages/curl:latest
baseCommand: curl
stdout: message
arguments:
- $( inputs.stac_item )
inputs:
stac_item:
type: string
common_band_name:
type: string
outputs:
hrefs:
type: string
outputBinding:
glob: message
loadContents: true
outputEval: |
${
const assets = JSON.parse(self[0].contents).assets;
const bandKey = Object.keys(assets).find(key =>
assets[key]['eo:bands'] &&
assets[key]['eo:bands'].length === 1 &&
assets[key]['eo:bands'].some(band => band.common_name === inputs.common_band_name)
);
if (!bandKey) {
throw new Error(`No valid asset found for band: ${inputs.common_band_name}`);
}
return assets[bandKey].href;
}
- class: CommandLineTool
id: rio_stack
requirements:
DockerRequirement:
dockerPull: ghcr.io/eoap/how-to/rio:1.0.0
EnvVarRequirement:
envDef:
GDAL_TIFF_INTERNAL_MASK: YES
GDAL_HTTP_MERGE_CONSECUTIVE_RANGES: YES
CPL_VSIL_CURL_ALLOWED_EXTENSIONS: ".tif"
InitialWorkDirRequirement:
listing:
- entryname: run.sh
entry: |-
#!/bin/bash
rio stack $@
baseCommand: ["/bin/bash", "run.sh"]
arguments:
- valueFrom: "${ \n var arr = [];\n for(var i=0; i<inputs.tiffs.length; i++) {\n arr.push(inputs.tiffs[i]); \n }\n return arr; \n }\n"
- stacked.tif
inputs:
tiffs:
type: string[]
outputs:
stacked:
type: File
outputBinding:
glob: stacked.tif
- class: CommandLineTool
id: rio_color
requirements:
DockerRequirement:
dockerPull: ghcr.io/eoap/how-to/rio:1.0.0
InitialWorkDirRequirement:
listing:
- entryname: run.sh
entry: |-
#!/bin/bash
rio color -j -1 --out-dtype uint8 $1 rgb.tif "gamma 3 0.95, sigmoidal rgb 35 0.13"
baseCommand: ["/bin/bash", "run.sh"]
arguments:
- $( inputs.stacked.path )
inputs:
stacked:
type: File
outputs:
rgb:
type: File
outputBinding:
glob: rgb.tif
Its graphical representation:
- Scatter Configuration
- Step 1: Scatter on
common_band_name
The step_curl
step scatters over the common_band_name
parameter array ([red-band, green-band, blue-band]
).
in:
stac_item: stac-item
common_band_name: [red-band, green-band, blue-band]
out:
- hrefs
run: "#stac"
scatter: common_band_name
scatterMethod: dotproduct
scatter
: Scatters the step over the values ofcommon_band_name
.scatterMethod: dotproduct
: Matches corresponding elements in the array for parallel execution.
Step 2: Combine Results
The step_stack
step combines the TIFF files fetched by step_curl.
in:
tiffs:
source: step_curl/hrefs
out:
- stacked
run: "#rio_stack"
- Substeps and Tools
stac
Tool
- Fetches URLs of the TIFF files corresponding to each band.
rio_stack
Tool
- Stacks the TIFF files into a single composite file.
- Run the Workflow
Execute the workflow with the default parameters:
cwltool scatter-input-parameters.cwl \
--stac-item https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A
INFO /opt/hostedtoolcache/Python/3.12.8/x64/bin/cwltool 3.1.20241217163858
INFO Resolved '../cwl-workflows/scatter-input-parameters.cwl' to 'file:///home/runner/work/how-to/how-to/cwl-workflows/scatter-input-parameters.cwl'
INFO [workflow ] start
INFO [workflow ] starting step step_curl
INFO [step step_curl] start
INFO [job step_curl] /tmp/ych5i1_7$ docker \
run \
-i \
--mount=type=bind,source=/tmp/ych5i1_7,target=/RWLLmT \
--mount=type=bind,source=/tmp/4kxl2o7a,target=/tmp \
--workdir=/RWLLmT \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/99tp8wky/20250102120551-178050.cid \
--env=TMPDIR=/tmp \
--env=HOME=/RWLLmT \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/ych5i1_7/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 25188 0 --:--:-- --:--:-- --:--:-- 25200
INFO [job step_curl] completed success
INFO [step step_curl] start
INFO [job step_curl_2] /tmp/_143krc8$ docker \
run \
-i \
--mount=type=bind,source=/tmp/_143krc8,target=/RWLLmT \
--mount=type=bind,source=/tmp/0utxu8gn,target=/tmp \
--workdir=/RWLLmT \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/jt_o5alp/20250102120552-189977.cid \
--env=TMPDIR=/tmp \
--env=HOME=/RWLLmT \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/_143krc8/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 32199 0 --:--:-- --:--:-- --:--:-- 32241
INFO [job step_curl_2] completed success
INFO [step step_curl] start
INFO [job step_curl_3] /tmp/zdsr8rxz$ docker \
run \
-i \
--mount=type=bind,source=/tmp/zdsr8rxz,target=/RWLLmT \
--mount=type=bind,source=/tmp/68hh0iok,target=/tmp \
--workdir=/RWLLmT \
--read-only=true \
--log-driver=none \
--user=1001:128 \
--rm \
--cidfile=/tmp/nkbnlgcy/20250102120553-200864.cid \
--env=TMPDIR=/tmp \
--env=HOME=/RWLLmT \
docker.io/curlimages/curl:latest \
curl \
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_53HPA_20210723_0_L2A > /tmp/zdsr8rxz/message
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10156 100 10156 0 0 40526 0 --:--:-- --:--:-- --:--:-- 40624
INFO [job step_curl_3] completed success
INFO [step step_curl] completed success
INFO [workflow ] starting step step_stack
INFO [step step_stack] start
INFO [job step_stack] /tmp/5gkpstq6$ docker \
run \
-i \
--mount=type=bind,source=/tmp/5gkpstq6,target=/RWLLmT \
--mount=type=bind,source=/tmp/haxr2vrb,target=/tmp \
--workdir=/RWLLmT \
--read-only=true \
--user=1001:128 \
--rm \
--cidfile=/tmp/m17jsy5k/20250102120554-226789.cid \
--env=TMPDIR=/tmp \
--env=HOME=/RWLLmT \
--env=CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif \
--env=GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES \
--env=GDAL_TIFF_INTERNAL_MASK=YES \
ghcr.io/eoap/how-to/rio:1.0.0 \
/bin/bash \
run.sh \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B04.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B03.tif \
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/53/H/PA/2021/7/S2B_53HPA_20210723_0_L2A/B02.tif \
stacked.tif
INFO [job step_stack] Max memory used: 1208MiB
INFO [job step_stack] completed success
INFO [step step_stack] completed success
INFO [workflow ] starting step step_color
INFO [step step_color] start
INFO [job step_color] /tmp/3_xefnk5$ docker \
run \
-i \
--mount=type=bind,source=/tmp/3_xefnk5,target=/RWLLmT \
--mount=type=bind,source=/tmp/y0z6cu44,target=/tmp \
--mount=type=bind,source=/tmp/5gkpstq6/stacked.tif,target=/var/lib/cwl/stg0da48c82-4431-411b-9eb2-b2797bf3d1f7/stacked.tif,readonly \
--workdir=/RWLLmT \
--read-only=true \
--user=1001:128 \
--rm \
--cidfile=/tmp/d11x80lz/20250102120621-362976.cid \
--env=TMPDIR=/tmp \
--env=HOME=/RWLLmT \
ghcr.io/eoap/how-to/rio:1.0.0 \
/bin/bash \
run.sh \
/var/lib/cwl/stg0da48c82-4431-411b-9eb2-b2797bf3d1f7/stacked.tif
INFO [job step_color] Max memory used: 797MiB
INFO [job step_color] completed success
INFO [step step_color] completed success
INFO [workflow ] completed success
INFO Final process status is success
- Expected Output
Intermediate Outputs:
hrefs
: URLs of the TIFF files forred
,green
, andblue
bands.stacked.tif
: Composite TIFF file of all bands.
Final Output:
rgb-tif
: RGB composite TIFF file.
{
"rgb-tif": {
"location": "file:///home/runner/work/how-to/how-to/docs/rgb.tif",
"basename": "rgb.tif",
"class": "File",
"checksum": "sha1$0defab2532830f710a4a31599daff8632ec8ce02",
"size": 361747464,
"path": "/home/runner/work/how-to/how-to/docs/rgb.tif"
}
}
Key Takeaways¶
Scattering with Input Parameters:
- The
scatter
field enables parallel execution over array inputs. scatterMethod: dotproduct
ensures corresponding elements in arrays are processed together.
Multiple Input Fields:
MultipleInputFeatureRequirement
allows combining multiple input parameters in a single scatter operation.
Parallel and Modular Design:
- Scattering simplifies workflows by enabling parallel processing of input parameters.
This guide demonstrates how to use scattering to process multiple input parameters in parallel within a CWL workflow.