How it works

JupyterFlow has a strict constraint that it only works on Kubernetes(JupyterHub for Kubernetes or Kubeflow). This is because JupyterFlow collects user's execution environment information from Pod metadata and the source code from Kubernetes storage volume. JupyterFlow uses these information to construct a new Kubernetes manifest(Argo Workflow) for ML job without any containerization on behalf of you.

jupyterflow collects following metadata from jupyter notebook Pod.

  • Container image
  • Environment variables
  • Home directory volume (home PersistentVolumeClaim)
  • Extra volume mount points
  • Resource management (requests, limits)
  • NodeSelector label
  • Etc.

Following pseudo code might help you understand how jupyterflow works.

  • JupyterFlow main logic
jupyterflow run -c "python >> python"
# ...
# inside jupyterflow
# ...

# get user workflow(DAG) information.
user_workflow_data = get_user_workflow(user_input)

# collect metadata of current environment(jupyter notebook Pod).
nb_pod_spec = get_current_pod_spec_from_k8s(jupyter_notebook_pod_name, service_account)

# build Workflow manifest based on meta data and user workflow information.
argo_workflow_spec = build_workflow(nb_pod_spec, user_workflow_data)

# request new Argo workflow.
response = request_for_new_workflow_to_k8s(K8S_MASTER, argo_workflow_spec, service_account)

