Process generator
Experimental feature and unofficial extension to the CWL standards.
A process generator is a CWL Process type that executes a concrete CWL process (CommandLineTool, Workflow or ExpressionTool) which produces CWL files as output, then executes the CWL that was generated.
The intention is to have a formalized way to express a pre-processing or bootstrapping step in which a CWL description is generated by another program (such as from a template, or conversion from another workflow language).
The ProcessGenerator is a subtype of CWL process, so it must define its inputs and outputs. The “run” field is similar to the “run” field of a workflow step – it specifies a tool to run that will create new CWL as output.
- name: ProcessGenerator
type: record
inVocab: true
extends: cwl:Process
documentRoot: true
fields:
- name: class
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
type: string
- name: run
type: [string, cwl:Process]
jsonldPredicate:
_id: "cwl:run"
_type: "@id"
subscope: run
doc: |
Specifies the process to run.
Process generator example (pytoolgen.cwl)
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
$namespaces:
cwltool: "http://commonwl.org/cwltool#"
class: cwltool:ProcessGenerator
inputs:
script: string
dir: Directory
outputs: {}
run:
class: CommandLineTool
inputs:
script: string
dir: Directory
outputs:
runProcess:
type: File
outputBinding:
glob: main.cwl
requirements:
InlineJavascriptRequirement: {}
cwltool:LoadListingRequirement:
loadListing: shallow_listing
InitialWorkDirRequirement:
listing: |
${
var v = inputs.dir.listing;
v.push({entryname: "inp.py", entry: inputs.script});
return v;
}
arguments: [python, inp.py]
stdout: main.cwl
The process generator has two required inputs: “script” and “dir”. It runs the command line tool listed inline in “run” with the input object, which is required to have those parameters. Note: the input object may contain additional parameters which are intended for the generated CWL when it is executed.
The command line tool populates the working directory using InitialWorkDirRequirement. It uses the listing from ‘dir’ and adds a new file literal called “inp.py” which contains the text from the input parameter “script”. Then it runs “python inp.py”.
The output of this command line tool is the File parameter “runProcess”. In this example, the “inp.py” script, when run, is expected to print the CWL description to standard output, which will be captured in the “runProcess” output parameter.
Next, the ProcessGenerator will load file in the “runProcess” parameter, which in this example is “main.cwl”. Finally, it will execute the process with input object that was originally provided to the process generator.
The output of the generated script is used as the output for ProcessGenerator as a whole.
Here’s an example (zing.cwl) that uses pytoolgen.cwl.
#!/usr/bin/env cwltool
{cwl:tool: pytoolgen.cwl, script: {$include: "#attachment-1"}, dir: {class: Directory, location: .}}
--- |
import os
import sys
print("""
cwlVersion: v1.0
class: CommandLineTool
inputs:
zing: string
outputs: {}
arguments: [echo, $(inputs.zing)]
""")
The first line #!/usr/bin/env cwltool
means that this file can be
given the executable bit (+x) and then run directly.
This is a multi-part YAML file. The first section is a CWL input object.
The input object uses “cwl:tool” to indicate that this input object should be used as input to execute “pytoolgen.cwl”.
The parameter script: {$include: "#attachment-1"}
takes the text
from the second part of the file (following the YAML division marker
--- |
) and assigns it as a string value to “script”.
The “dir” parameter is not doing much in this example, but by capturing the whole directory it allows the Python script to refer to files in the current directory.
In this example the script is trivially printing CWL as a string, but of course could do something much more complex: generate code from a template, select among several possible workflows based on the input, convert from another workflow language, etc.
When this is executed, the following steps happen:
pytoolgen.py is loaded and executed with the 1st part of the file as the input object
The “script” parameter contains the contents of the second part. The inline command line tool creates a file called “inp.py” with the contents of “script”
The inline command line tool runs python on “inp.py” and collects the output, which is CWL description for a trivial “echo” tool.
It loads the CWL description and executes it with any additional parameters declared in the input object or command line.
Example runs
Note: requires cwltool
flags --enable-ext
and --enable-dev
You can set these with the environment parameter CWLTOOL_OPTIONS
$ export CWLTOOL_OPTIONS="--enable-dev --enable-ext"
$ ./zing.cwl
INFO /home/peter/work/cwltool/venv3/bin/cwltool 3.1.20211112163758
INFO Resolved './zing.cwl' to 'file:///home/peter/work/cwltool/tests/wf/generator/zing.cwl'
INFO [job d3626216-d7d8-4322-bc21-4d469634cc9a] /tmp/8sez90gb$ python \
inp.py > /tmp/8sez90gb/main.cwl
INFO [job d3626216-d7d8-4322-bc21-4d469634cc9a] completed success
usage: ./zing.cwl [-h] --zing ZING [job_order]
./zing.cwl: error: the following arguments are required: --zing
$ ./zing.cwl --zing blurf
INFO /home/peter/work/cwltool/venv3/bin/cwltool 3.1.20211112163758
INFO Resolved './zing.cwl' to 'file:///home/peter/work/cwltool/tests/wf/generator/zing.cwl'
INFO [job a580b69d-2b88-4268-904e-ed105ba7c85e] /tmp/ujff239o$ python \
inp.py > /tmp/ujff239o/main.cwl
INFO [job a580b69d-2b88-4268-904e-ed105ba7c85e] completed success
INFO [job main.cwl] /tmp/f_7bxncq$ echo \
blurf
blurf
INFO [job main.cwl] completed success
{
"runProcess": {
"location": "file:///home/peter/work/cwltool/tests/wf/generator/main.cwl",
"basename": "main.cwl",
"class": "File",
"checksum": "sha1$8c160b680fb2cededef3228a53425e595b8cdf48",
"size": 111,
"path": "/home/peter/work/cwltool/tests/wf/generator/main.cwl"
}
}
INFO Final process status is success
$ echo "zing: zoop" > job.yml
$ ./zing.cwl job.yml
INFO /home/peter/work/cwltool/venv3/bin/cwltool 3.1.20211112163758
INFO Resolved './zing.cwl' to 'file:///home/peter/work/cwltool/tests/wf/generator/zing.cwl'
INFO [job 9073a083-dc79-4719-8762-1c024480605c] /tmp/meeo3d19$ python \
inp.py > /tmp/meeo3d19/main.cwl
INFO [job 9073a083-dc79-4719-8762-1c024480605c] completed success
INFO [job main.cwl] /tmp/2pqdz5nq$ echo \
zoop
zoop
INFO [job main.cwl] completed success
{
"runProcess": {
"location": "file:///home/peter/work/cwltool/tests/wf/generator/main.cwl",
"basename": "main.cwl",
"class": "File",
"checksum": "sha1$8c160b680fb2cededef3228a53425e595b8cdf48",
"size": 111,
"path": "/home/peter/work/cwltool/tests/wf/generator/main.cwl"
}
}
INFO Final process status is success