The Kubeflow Pipelines SDK allows data scientists to define end-to-end machine learning and data pipelines. The output of the Kubeflow Pipelines SDK compiler is YAML for Argo.
The kfp-tekton
SDK is extending the Compiler
and the Client
of the Kubeflow
Pipelines SDK to generate Tekton YAML
and to subsequently upload and run the pipeline with the Kubeflow Pipelines engine
backed by Tekton.
- Extensions to the Kubeflow Pipelines SDK
- Project Prerequisites
- Installation
- Compiling a Kubeflow Pipelines DSL Script
- Running the Compiled Pipeline on a Tekton Cluster
- Building Tekton from Master
- Optional Features
- List of Available Features
- Tested Pipelines
- Troubleshooting
In addition to the functionality provided by the Kubeflow Pipelines
SDK the kfp-tekton
SDK provides a TektonCompiler
and a TektonClient
:
TektonCompiler
:
kfp_tekton.compiler.TektonCompiler.compile
compiles Python DSL code into a YAML file containing a TektonPipelineRun
which can be deployed directly to a Tekton enabled Kubernetes cluster or uploaded to the Kubeflow Pipelines dashboard with the Tekton backend.
TektonClient
:
kfp_tekton.TektonClient.create_run_from_pipeline_func
compiles DSL pipeline function and runs the pipeline on a Kubernetes cluster with KFP and Tekton
- Python:
3.5.3
or later - Tekton:
0.14.0
- Tekton CLI:
0.10.0
- Kubeflow Pipelines: KFP with Tekton backend
Follow the instructions for installing project prerequisites and take note of some important caveats.
You can install the latest release of the kfp-tekton
compiler from
PyPi. We recommend to create a Python
virtual environment first:
python3 -m venv .venv
source .venv/bin/activate
pip install kfp-tekton
Alternatively you can install the latest version of the kfp-tekton
compiler
from source by cloning the repository https://github.com/kubeflow/kfp-tekton:
-
Clone the
kfp-tekton
repo:git clone https://github.com/kubeflow/kfp-tekton.git cd kfp-tekton
-
Setup Python environment with Conda or a Python virtual environment:
python3 -m venv .venv source .venv/bin/activate
-
Build the compiler:
pip install -e sdk/python
-
Run the compiler tests (optional):
make test
The kfp-tekton
Python package comes with the dsl-compile-tekton
command line
executable, which should be available in your terminal shell environment after
installing the kfp-tekton
Python package.
If you cloned the kfp-tekton
project, you can find example pipelines in the
samples
folder or under sdk/python/tests/compiler/testdata
folder.
dsl-compile-tekton \
--py sdk/python/tests/compiler/testdata/parallel_join.py \
--output pipeline.yaml
Note: If the KFP DSL script contains a __main__
method calling the
kfp_tekton.compiler.TektonCompiler.compile()
function:
if __name__ == "__main__":
from kfp_tekton.compiler import TektonCompiler
TektonCompiler().compile(pipeline_func, "pipeline.yaml")
... then the pipeline can be compiled by running the DSL script with python3
executable from a command line shell, producing a Tekton YAML file pipeline.yaml
in the same directory:
python3 pipeline.py
After compiling the sdk/python/tests/compiler/testdata/parallel_join.py
DSL script
in the step above, we need to deploy the generated Tekton YAML to our Kubernetes
cluster with kubectl
. The Tekton server will automatically start a pipeline run
for which we can follow the logs using the tkn
CLI.
Here we have to deploy the pipeline in the kubeflow namespace because all the pipelines with metadata and artifacts tracking rely on the minio object storage credentials in the kubeflow namespace.
kubectl apply -f pipeline.yaml -n kubeflow
tkn pipelinerun logs --last -n kubeflow
Once the Tekton Pipeline is running, the logs should start streaming:
Waiting for logs to be available...
[gcs-download : main] With which he yoketh your rebellious necks Razeth your cities and subverts your towns And in a moment makes them desolate
[gcs-download : copy-artifacts] Added `storage` successfully.
[gcs-download : copy-artifacts] tar: removing leading '/' from member names
[gcs-download : copy-artifacts] tekton/results/data
[gcs-download : copy-artifacts] `data.tgz` -> `storage/mlpipeline/artifacts/parallel-pipeline/gcs-download/data.tgz`
[gcs-download : copy-artifacts] Total: 0 B, Transferred: 195 B, Speed: 1 B/s
[gcs-download-2 : main] I find thou art no less than fame hath bruited And more than may be gatherd by thy shape Let my presumption not provoke thy wrath
[gcs-download-2 : copy-artifacts] Added `storage` successfully.
[gcs-download-2 : copy-artifacts] tar: removing leading '/' from member names
[gcs-download-2 : copy-artifacts] tekton/results/data
[gcs-download-2 : copy-artifacts] `data.tgz` -> `storage/mlpipeline/artifacts/parallel-pipeline/gcs-download-2/data.tgz`
[gcs-download-2 : copy-artifacts] Total: 0 B, Transferred: 205 B, Speed: 1 B/s
[echo : main] Text 1: With which he yoketh your rebellious necks Razeth your cities and subverts your towns And in a moment makes them desolate
[echo : main]
[echo : main] Text 2: I find thou art no less than fame hath bruited And more than may be gatherd by thy shape Let my presumption not provoke thy wrath
[echo : main]
In order to utilize the latest features and functions of the kfp-tekton
compiler,
we suggest to install Tekton from a nightly built or build it from the
master
branch. Features that require a special build, different from the 'Tested Version',
will be listed below.
By default, artifacts are enabled because the KFP DSL are designed to run on Kubeflow Pipeline's engine with artifacts to be stored on Minio storage. When artifacts are enabled, all the output parameters are also treated as artifacts and persisted to the default object storage. Enabling artifacts also allows files to be downloaded or stored as artifact inputs/outputs. Since artifacts are dependent on the Kubeflow Pipeline's deployment, the generated Tekton pipeline must be deployed to the same namespace as Kubeflow Pipelines.
To run Tekton pipelines without installing Kubeflow pipeline, or if you need to compile the Kubeflow
Pipelines DSL without artifacts, add the --disable-artifacts
argument to your
dsl-compile-tekton
commands. Then, run the pipeline in the same namespace that is
used by Kubeflow Pipelines (typically kubeflow
) by specifying the -n
flag:
dsl-compile-tekton \
--py sdk/python/tests/compiler/testdata/parallel_join.py \
--output pipeline.yaml \
--disable-artifacts
kubectl apply -f pipeline.yaml -n kubeflow
tkn pipelinerun logs --last -n kubeflow
You should see log messages without any artifact reference:
Waiting for logs to be available...
[gcs-download : main] With which he yoketh your rebellious necks Razeth your cities and subverts your towns And in a moment makes them desolate
[gcs-download-2 : main] I find thou art no less than fame hath bruited And more than may be gatherd by thy shape Let my presumption not provoke thy wrath
[echo : main] Text 1: With which he yoketh your rebellious necks Razeth your cities and subverts your towns And in a moment makes them desolate
[echo : main]
[echo : main] Text 2: I find thou art no less than fame hath bruited And more than may be gatherd by thy shape Let my presumption not provoke thy wrath
[echo : main]
To understand how each feature is implemented and its current status, please visit the FEATURES doc.
We are testing the compiler on more than 80 pipelines
found in the Kubeflow Pipelines repository, specifically the pipelines in KFP compiler
testdata
folder, the KFP core samples and the samples contributed by third parties.
A report card of Kubeflow Pipelines samples that are currently supported by the kfp-tekton
compiler can be found here.
If you work on a PR that enables another of the missing features please ensure that
your code changes are improving the number of successfully compiled KFP pipeline samples.
-
When you encounter permission issues related to ServiceAccount, refer to Servince Account and RBAC doc
-
If you run into the error
bad interpreter: No such file or director
when trying to use python's venv, remove the current virtual environment in the.venv
directory and create a new one usingvirtualenv .venv
-
For big data passing, user need to create PV manually, or enable dynamic volume provisioning, refer to the link of: https://kubernetes.io/docs/concepts/storage/dynamic-provisioning
User need to create pvc manually with the pvc name same as pipelinerun name until issue #181 addressed