Skip to content

Configuration

workflow.yaml Configuration

workflow.yaml is a description about how to run your jobs. You can asign a dependencies between jobs. The name and the path does not matter as long as you pass to run -f option argument.

Minimum description

Each job will run in parallel.

# workflow.yaml
jobs:
- echo hello
- echo world
- echo again
jupyterflow run -f workflow.yaml

Full description

Job dependencies will be resolved based on dags information.

# workflow.yaml
version: 1

name: workflow-name

jobs:
- echo hello
- echo world
- echo again

cmd_mode: exec   # shell

dags:
- 1 >> 2
- 1 >> 3

schedule: '*/2 * * * *'
Property Description Optional Default
version Version of workflow.yaml file format. Optional 1
name Name of the workflow. This name is used for Argo Workflow name. Optional HOSTNAME of notebook
jobs Jobs to run. Any kinds of command works. Required
cmd_mode Specify command form, whether exec or shell. Optional exec
dags Defining job dependencies. Index starts at 1. ($PREVIOUS_JOB >> $NEXT_JOB) Optional All jobs parallel (No dependency)
schedule When to execute this workflow. Follows cron format. Optional Run immediately
Comparing cmd_mode

exec vs shell

  • In exec mode, your command will be executed as ["echo", "hello"].
  • In shell mode, your command will be executed as ["/bin/sh", "-c", "echo hello"].

In exec mode, the command is more straightforward since there is no shell process involved and it is being called directly. In shell mode, you can fully utilize the power of shell, such as shell script commands. (>>, && and so on.) For more detail explanation, refer to Docker run form

Jupyterflow Configuration (Advanced)

For more detail control of JupyterFlow, you can override Argo Workflow spec by configuring JupyterFlow config file(default: $HOME/.jupyterflow.yaml). Configuring JupyterFlow requires Kubernetes Pod specification understandings.

The following command will create .jupyterflow.yaml on $HOME directory. JupyterFlow configuration file path can be changed by setting JUPYTERFLOW_CONFIG_FILE environment variable(export JUPYTERFLOW_CONFIG_FILE=/tmp/myjupyterflow.yaml).

jupyterflow config --generate-config
# jupyterflow config file created.

cat $HOME/.jupyterflow.yaml   # or run, `jupyterflow config` to view config
# spec:
#   image: jupyter/datascience-notebook:latest
#   imagePullPolicy: Always
#   imagePullSecrets:
#   -  name: "default"
#   env:
#   - name: "CUSTOM_KEY"
#     value: "CUSTOM_VAL"
#   resources:
#     requests:
#       cpu: 500m
#       memory: 500Mi
#     limits:
#       cpu: 500m
#       memory: 500Mi
#   nodeSelector: {}
#   runAsUser: 1000
#   runAsGroup: 100
#   serviceAccountName: default
#   volumes:
#   - name: nas001
#     persistentVolumeClaim:
#       claimName: nas001
#   volumeMounts:
#   - name: nas001
#     mountPath: /nas001

Umcomment the property you want to override. For example, if you want your workflow jobs to run on GPU nodes, configure spec.resources or spec.nodeSelector property.

spec:
#   image: jupyter/datascience-notebook:latest
#   imagePullPolicy: Always
#   imagePullSecrets:
#   -  name: "default"
#   env:
#   - name: "CUSTOM_KEY"
#     value: "CUSTOM_VAL"
  resources:
    requests:
      cpu: 500m
      memory: 500Mi
      nvidia.com/gpu: 1
    limits:
      cpu: 500m
      memory: 500Mi
      nvidia.com/gpu: 1
  nodeSelector:
    accelerator: nvidia-node
#   runAsUser: 1000
#   runAsGroup: 100
#   serviceAccountName: default
#   volumes:
#   - name: nas001
#     persistentVolumeClaim:
#       claimName: nas001
#   volumeMounts:
#   - name: nas001
#     mountPath: /nas001

Run jupyterflow and check out the result whether your workflow has run on GPU nodes.

jupyterflow run -f workflow.yaml

Note

env, volumes, volumeMounts property will be appended to the original Pod spec, others will override the original Pod spec.