Process generator

Experimental feature and unofficial extension to the CWL standards.

A process generator is a CWL Process type that executes a concrete CWL process (CommandLineTool, Workflow or ExpressionTool) which produces CWL files as output, then executes the CWL that was generated.

The intention is to have a formalized way to express a pre-processing or bootstrapping step in which a CWL description is generated by another program (such as from a template, or conversion from another workflow language).

The ProcessGenerator is a subtype of CWL process, so it must define its inputs and outputs. The “run” field is similar to the “run” field of a workflow step – it specifies a tool to run that will create new CWL as output.

- name: ProcessGenerator
  type: record
  inVocab: true
  extends: cwl:Process
  documentRoot: true
  fields:
    - name: class
      jsonldPredicate:
        "_id": "@type"
        "_type": "@vocab"
      type: string
    - name: run
      type: [string, cwl:Process]
      jsonldPredicate:
        _id: "cwl:run"
        _type: "@id"
        subscope: run
      doc: |
        Specifies the process to run.

Process generator example (pytoolgen.cwl)

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
$namespaces:
  cwltool: "http://commonwl.org/cwltool#"
class: cwltool:ProcessGenerator
inputs:
  script: string
  dir: Directory
outputs: {}
run:
  class: CommandLineTool
  inputs:
    script: string
    dir: Directory
  outputs:
    runProcess:
      type: File
      outputBinding:
        glob: main.cwl
  requirements:
    InlineJavascriptRequirement: {}
    cwltool:LoadListingRequirement:
      loadListing: shallow_listing
    InitialWorkDirRequirement:
      listing: |
        ${
         var v = inputs.dir.listing;
         v.push({entryname: "inp.py", entry: inputs.script});
         return v;
        }
  arguments: [python, inp.py]
  stdout: main.cwl

The process generator has two required inputs: “script” and “dir”. It runs the command line tool listed inline in “run” with the input object, which is required to have those parameters. Note: the input object may contain additional parameters which are intended for the generated CWL when it is executed.

The command line tool populates the working directory using InitialWorkDirRequirement. It uses the listing from ‘dir’ and adds a new file literal called “inp.py” which contains the text from the input parameter “script”. Then it runs “python inp.py”.

The output of this command line tool is the File parameter “runProcess”. In this example, the “inp.py” script, when run, is expected to print the CWL description to standard output, which will be captured in the “runProcess” output parameter.

Next, the ProcessGenerator will load file in the “runProcess” parameter, which in this example is “main.cwl”. Finally, it will execute the process with input object that was originally provided to the process generator.

The output of the generated script is used as the output for ProcessGenerator as a whole.

Here’s an example (zing.cwl) that uses pytoolgen.cwl.

#!/usr/bin/env cwltool
{cwl:tool: pytoolgen.cwl, script: {$include: "#attachment-1"}, dir: {class: Directory, location: .}}
--- |
import os
import sys
print("""
cwlVersion: v1.0
class: CommandLineTool
inputs:
  zing: string
outputs: {}
arguments: [echo, $(inputs.zing)]
""")

The first line #!/usr/bin/env cwltool means that this file can be given the executable bit (+x) and then run directly.

This is a multi-part YAML file. The first section is a CWL input object.

The input object uses “cwl:tool” to indicate that this input object should be used as input to execute “pytoolgen.cwl”.

The parameter script: {$include: "#attachment-1"} takes the text from the second part of the file (following the YAML division marker --- |) and assigns it as a string value to “script”.

The “dir” parameter is not doing much in this example, but by capturing the whole directory it allows the Python script to refer to files in the current directory.

In this example the script is trivially printing CWL as a string, but of course could do something much more complex: generate code from a template, select among several possible workflows based on the input, convert from another workflow language, etc.

When this is executed, the following steps happen:

  1. pytoolgen.py is loaded and executed with the 1st part of the file as the input object

  2. The “script” parameter contains the contents of the second part. The inline command line tool creates a file called “inp.py” with the contents of “script”

  3. The inline command line tool runs python on “inp.py” and collects the output, which is CWL description for a trivial “echo” tool.

  4. It loads the CWL description and executes it with any additional parameters declared in the input object or command line.

Example runs

Note: requires cwltool flags --enable-ext and --enable-dev

You can set these with the environment parameter CWLTOOL_OPTIONS

$ export CWLTOOL_OPTIONS="--enable-dev --enable-ext"

$ ./zing.cwl
INFO /home/peter/work/cwltool/venv3/bin/cwltool 3.1.20211112163758
INFO Resolved './zing.cwl' to 'file:///home/peter/work/cwltool/tests/wf/generator/zing.cwl'
INFO [job d3626216-d7d8-4322-bc21-4d469634cc9a] /tmp/8sez90gb$ python \
    inp.py > /tmp/8sez90gb/main.cwl
INFO [job d3626216-d7d8-4322-bc21-4d469634cc9a] completed success
usage: ./zing.cwl [-h] --zing ZING [job_order]
./zing.cwl: error: the following arguments are required: --zing
$ ./zing.cwl --zing blurf
INFO /home/peter/work/cwltool/venv3/bin/cwltool 3.1.20211112163758
INFO Resolved './zing.cwl' to 'file:///home/peter/work/cwltool/tests/wf/generator/zing.cwl'
INFO [job a580b69d-2b88-4268-904e-ed105ba7c85e] /tmp/ujff239o$ python \
    inp.py > /tmp/ujff239o/main.cwl
INFO [job a580b69d-2b88-4268-904e-ed105ba7c85e] completed success
INFO [job main.cwl] /tmp/f_7bxncq$ echo \
    blurf
blurf
INFO [job main.cwl] completed success
{
    "runProcess": {
        "location": "file:///home/peter/work/cwltool/tests/wf/generator/main.cwl",
        "basename": "main.cwl",
        "class": "File",
        "checksum": "sha1$8c160b680fb2cededef3228a53425e595b8cdf48",
        "size": 111,
        "path": "/home/peter/work/cwltool/tests/wf/generator/main.cwl"
    }
}
INFO Final process status is success
$ echo "zing: zoop" > job.yml
$ ./zing.cwl job.yml
INFO /home/peter/work/cwltool/venv3/bin/cwltool 3.1.20211112163758
INFO Resolved './zing.cwl' to 'file:///home/peter/work/cwltool/tests/wf/generator/zing.cwl'
INFO [job 9073a083-dc79-4719-8762-1c024480605c] /tmp/meeo3d19$ python \
    inp.py > /tmp/meeo3d19/main.cwl
INFO [job 9073a083-dc79-4719-8762-1c024480605c] completed success
INFO [job main.cwl] /tmp/2pqdz5nq$ echo \
    zoop
zoop
INFO [job main.cwl] completed success
{
    "runProcess": {
        "location": "file:///home/peter/work/cwltool/tests/wf/generator/main.cwl",
        "basename": "main.cwl",
        "class": "File",
        "checksum": "sha1$8c160b680fb2cededef3228a53425e595b8cdf48",
        "size": 111,
        "path": "/home/peter/work/cwltool/tests/wf/generator/main.cwl"
    }
}
INFO Final process status is success