Loops

The cwltool:Loop requirement enables workflow-level looping of a step. It is valid only under requirements of a WorkflowStep. Unlike other CWL requirements, Loop requirement is not propagated to inner steps.

The cwltool:Loop is not compatible with scatter and when. Combining a cwltool:Loop requirement with a scatter or a when clause in the same step will produce an error.

The loop condition

The loopWhen field controls loop termination. It is an expansion of the CWL v1.2 when construct, which controls conditional execution. This is an expression that must be evaluated with inputs bound to the step input object and outputs produced in the last step execution, and returns a boolean value. It is an error if this expression returns a value other than true or false. For example:

example:
  run:
    class: ExpressionTool
    inputs:
      i1: int
    outputs:
      o1: int
    expression: >
      ${return {'o1': inputs.i1 + 1};}
  in:
    i1: i1
  out: [o1]
  requirements:
    cwltool:Loop:
      loopWhen: $(inputs.i1 < 10)
      loop:
        i1: o1
      outputMethod: last

This loop executes untile the counter i1 reaches the value of 10, and then terminates. Note that if the loopWhen condition evaluates to false prior to the first iteration, the loop is skipped. The value assumed by the output fields depends on the specified outputMethod, as described below.

The loop field

The loop field defines the input parameters of the loop iterations after the first one (inputs of the first iteration are the step input parameters). If no loop rule is specified for a given step in field, the initial value is kept constant among all iterations.

The LoopInput is basically a reduced version of the WorkflowStepInput structure with the possibility to include outputs of the previous step execution in the valueFrom expression.

Field	Required	Type	Description
`id`	optional	string	It must reference the `id` of one of the elements in the `in` field of the step.
`loopSource`	optional	string? \| string[]?	Specifies one or more of the step output parameters that will provide input to the loop iterations after the first one (inputs of the first iteration are the step input parameters).
`linkMerge`	optional	LinkMergeMethod	The method to use to merge multiple inbound links into a single array. If not specified, the default method is `merge_nested`.
`pickValue`	optional	PickValueMethod	The method to use to choose non-null elements among multiple sources.
`valueFrom`	optional	string \| Expression	To use `valueFrom`, StepInputExpressionRequirement must be specified in the workflow or workflow step requirements. If `valueFrom` is a constant string value, use this as the value for this input parameter. If `valueFrom` is a parameter reference or expression, it must be evaluated to yield the actual value to be assigned to the input field. The `self` value in the parameter reference or expression must be `null` if there is no `loopSource` field, or the value of the parameter(s) specified in the `loopSource` field. The value of `inputs` in the parameter reference or expression must be the input object to the previous iteration of the workflow step (or the initial inputs for the first iteration).

Loop output modes

The outputMethod field specifies the desired method of dealing with loop outputs. It behaves similarly to the scatterMethod field. For the sake of simplicity, there can be a single outputMethod field for each step instead of specifying a different behaviour for each output element. The outputMethod field can take two possible values: last or all.

The last output mode propagates only the last computed element to the subsequent steps when the loop terminates. When a loop with an outputMethod equal to last is skipped, each output assumes a null value.

This is the most recurrent behaviour and it is typical of the optimization processes, when a step must iterate until a desired precision is reached. For example:

optimization:
  in:
    a: a
    prev_a:
      default: ${ return inputs.a - (2 * inputs.threshold) }
    threshold: threshold
  run: optimize.cwl
  out: [a]
  requirements:
    cwltool:Loop:
      loopWhen: ${ return (inputs.a - inputs.prev_a) > inputs.threshold)
      loop:
        a: a
        prev_a:
          valueFrom: $(inputs.a)
      outputMethod: last

This loop keeps optimizing the initial a value until the error value falls below a given (constant) threshold. Then, the last values of a will be propagated.

The all output mode propagates a single array with all output values to the subsequent steps when the loop terminates. When a loop with an outputMethod equal to all is skipped, each output assumes a [] value.

This behaviour is needed when a recurrent simulation produces loop-carried results, but the subsequent steps need to know the total amount of computed values to proceed. For example:

simulation:
  in:
    a: a
    day:
      default: 0
    max_day: max_day
  run: simulate.cwl
  out: [a]
  requirements:
    cwltool:Loop:
      loopWhen: ${ return inputs.day < inputs.max_day }
      loop:
        a: a
        day:
          valueFrom: $(inputs.day + 1)
      outputMethod: all

In this case, subsequent steps can start processing outputs even before the simulation step terminates. When a loop with an outputMethod equal to last is skipped, each output assumes a null value.

Loop-independent iterations

If a cwltool:Loop comes with loop-independent iterations, i.e. if each iteration does not depend on the result produced by the previous ones, all iterations can be processed concurrently. For example:

example:
  run: inner.cwl
  in:
    i1: i1
  out: [o1]
  requirements:
    cwltool:Loop:
      loopWhen: $(inputs.i1 < 10)
      loop:
        i1:
          valueFrom: $(inputs.i1 + 1)
      outputMethod: all

Since each iteration of this loop only depends on the input field i1, all its iterations can be processed in parallel if there is enough computing power.