Releases: sky-uk/kfp-operator
v0.6.0
Namespace isolation of Vertex AI resources
In order to preserve isolation of resources from multiple namespaces submitting to a single Vertex AI project, resources (Schedules, Pipeline Runs, Pipeline definition storage location, Artefact Storage Location) are now prefixed with their origin namespace.
What's Changed
- Reduce NATS connectionBackoff in eventsource by @grahamia in #323
- Change to CONTAINER_REPOSITORIES for artifact repository location by @grahamia in #324
- VAI Provider - Namespaced custom resources for pipelines (scheduled and one off) by @grahamia in #326
Full Changelog: v0.5.0...v0.6.0
Migration
Migrating from v0.5.0 to v0.6.0 will require re-compilation of all pipeline
definitions and re-applying run-schedules
. A
script is available to help trigger existing pipelines and remove existing run-schedules, these will be automatically recreated.
Note: Script requires jq version >1.7 and bash >4
CONTAINER_REPOSITORIES
environment variable has replaced the existing CONTAINER_REGISTRY_HOSTS
v0.5.0
Native Vertex AI Scheduler API
Due to Vertex AI's lack of support for scheduled pipeline runs, the KFP-Operator had to create Google Cloud Scheduler objects as well as PubSub subscriptions for managing enqueued and ongoing runs. This setup has now been superseded by Vertex AI's scheduler API as well as Vertex AI's task-level event logs.
This release now only uses the native scheduler within Vertex AI.
What's Changed
Migration
If you already have any running schedules in Google Cloud Scheduler, first upgrade to v0.4.1 then ensure all schedules have been migrated from Google Cloud Scheduler into Vertex AI Scheduler. This can be done by deleting all scheduler resources
> kubectl delete mlrs --all
and the KFP-Operator will automatically migrate all the schedules across.
Then upgrade to this version.
Full Changelog: v0.4.1...v0.5.0
v0.4.1
Native Vertex AI Scheduler API
Due to Vertex AI's lack of support for scheduled pipeline runs, the KFP-Operator had to create Google Cloud Scheduler objects as well as PubSub subscriptions for managing enqueued and ongoing runs. This setup has now been superseded by Vertex AI's scheduler API as well as Vertex AI's task-level event logs.
This release will migrate currently existing Cloud Scheduler scheduled runs on any update to the schedule and setup the updated schedule within Vertex AI Scheduler. Currently scheduled jobs will carry on working as normal.
The suggested process for migration is to delete all RunSchedule
resources. The KFP-Operator will then reconcile the corresponding RunConfiguration
resources by recreating schedules in Vertex AI.
The next release will have this migration process and all the legacy GCP Cloud Scheduler code/setup removed.
Improvements
- Add metadata to status-updater sensor metadata #289
- Complete RCs with one-off runs #291
- Succeed RC when dependencies are not met #292
- Refactor argo-events gRPC API dependency #296
- Make topic in public eventbus configurable #301
- Handle fullstops in pipeline version #306
- Provide provider name in RunCompletionEvents #310
- Support TFX 1.14 #297
- Update quickstart image #314
- Log-based events #311
- Vertex AI Provider - migrate to using Vertex AI provided scheduler #319
Bugfixes
- Fix intermittent decoupled test failures #312
Full Changelog: v0.4.0...v0.4.1
v0.4.0
Training-Time Model Ensembling
We have introduced support for declaring dependencies between training pipelines at training time through produced and consumed artifacts.
Improvements
- Add new RunConfigurations triggers
- on changes to the referenced pipeline
- on changes to the definition of the corresponding run
- on completion of another run configuration
- Expose artifacts to be consumed by a dependent run configuration
See https://sky-uk.github.io/kfp-operator/docs/getting-started/example/ for an in-depth example of training-time model ensembling
Bug fixes
- Store and propagate provider in RunConfigurations #232
- Filter Runschedules marked for deletion #235
- Allow valid docker tags in pipeline identifier #281
- Initialise ServingModelArtifacts in run completion events #282
Deprecation notes
- All versions other than
v1alpha5
are deprecated, and all resources should be upgraded to the latest schema servingModelArtifacts
in run completion events has been deprecated in favour of the more genericartifacts
Migration
After upgrading to this version, perform the following steps to ensure optimal behaviour:
- Force re-upload of RunConfigurations by deleting all RunSchedules and triggering re-creation
v0.3.0
Vertex AI Support
We have introduced support for managing machine learning resources on Vertex AI declaratively.
Improvements
- Vertex AI support
- Support multiple providers in a single KFP-Operator instance #171
- Provider workflows now run in a dedicated namespace instead of the user namespace #183
- Introduce one-off pipeline run resource #64
- The eventing system has been redesigned (see updated docs for details) #89
Bug fixes
- RunConfiguration runtime parameters not created #175
v0.2.1
Python 3.9 support
We have introduced support for TFX pipelines built using Python 3.9. This means, TFX is now supported up to it's recent release of 1.9.1.
Improvements
- #160 Changes the compiler so that the pipelines Python version is detected and the respective compiler path is set.
- #161 Has changed the way CRD version conversion works. We have made the decision to never error in version conversions and preserve incompatible fields in all versions. This allows the K8s API server and other components to keep requesting old versions, even if they are compatible.
v0.2.0
Workflow Templates, Named Lists and Schema Conversions
This release increases schema versions to v1alpha3
.
Schema conversions from v1alpha2
onwards now support Kubernetes CRD conversions, allowing users to migrate resources in their own time.
Improvements
-
#90 Argo Workflows have been refactored to use workflow templates stored in the cluster, which will be beneficial for upcoming work supporting the Vertex AI backend.
-
#31 All map fields in CRDs have been restructured to follow the K8s convention:
apiVersion: pipelines.kubeflow.org/v1alpha1
kind: Pipeline
metadata:
name: pipeline-sample
spec:
env:
ENV_ARG: example
beamArgs:
experiments: an_experiment
will now be
apiVersion: pipelines.kubeflow.org/v1alpha3
kind: Pipeline
metadata:
name: pipeline-sample
spec:
env:
- name: ENV_ARG
value: example
beamArgs:
- name: experiments
value: an_experiment
Consequently, beamArgs
may now contain duplicate names, which will be passed on respectively.
- #67 RunConfigurations can now train pipelines at specified versions in addition to tracking the latest changes.
pipelineName
has therefore been renamed topipeline
to allow specifying a pipeline with and without a version:
apiVersion: pipelines.kubeflow.org/v1alpha3
kind: RunConfiguration
metadata:
name: pipeline-sample
spec:
pipeline: pipeline-sample:257c1e6-440251
v0.1.1
CRD Version Downgrade
In previous versions, all CRDs have been released as v1
, which doesn't represent the state of the project correctly. This release downgrades all CRD versions to valpha1
- allowing future releases to incrementally increase this version.
This is a breaking release and we recommend not using a version prior to this release. If you have installed a previous version and want to upgrade, you will have to manually migrate resources. Please reach out via https://github.com/sky-uk/kfp-operator/discussions if you need assistance.
v0.1.0
Public Alpha
As part of this release, ownership labels in workflows have been renamed. This is a breaking change, and the upgrade requires manual migration detailed in the PR.
Improvements
- #138 High Availability
Bug Fixes
None.
v0.0.3
Experiment Resources
#1 Introduces the new Experiment Custom Resource Definition which allows the declarative definition of scheduled KFP pipeline Experiments.
apiVersion: pipelines.kubeflow.org/v1
kind: Experiment
Improvements
- #75 Rename
Model Update Event Source
toRun Completion Event Source
which also emits events for failed pipeline runs - #74 Provide Kubernetes Events for all resource kinds
- #101 Introduce
ObservedGeneration
Operator best practice