-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reversible upgrades #1096
Comments
Thank you for reporting us your feedback! The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6344.
|
CanaryThere are some projects that follow this approach, i.e. Istio. For this strategy we need to ensure that all KF sub-components (Istio, KServe, KFP etc) can support canary upgrades. This means the control plane, of the different apps, will need to ensure they:
Some more mature projects like Istio support canary upgrades, but a lot of Kubeflow components don't provide such mechanism. So this results in the following problematic potential scenarios, in which we can't have 2 versions of:
Regarding CRDs: The pattern in K8s that a lot of Controllers follow is to have a webhook conversion, since the K8s API might persist an older version of a CR and a component to request a newer Because of the above we can't do canary upgrades in Kubeflow |
Blue / GreenIn this case we'll need to install a new KF version in a separate cluster and then move all the state to the new cluster. The above method can give us rollback support (reversible upgrades) out of the box, since the original installation is left intact. The main drawbacks to this are:
The above result mainly in more cost and time to do the upgrade, but is more safe in terms of rollback. Going down this approach also means we have a full backup and restore strategy |
In-PlaceIn this case we will refresh every charm to the new version, which is essentially the current upgrade instructions that we have. There are some immediate limitations we need to expose:
The main benefit of this approach is that we don't have to move data across and it's relatively more straight forward. The downside is that we don't have a silver bullet approach for refreshing to older version, in case of issues (charms being in blocked state) |
From the above the most promising one IMO is the Blue / Green upgrade strategy, as it
To fully implement the above strategy though we will need to ensure we can copy over all state from one cluster to the next. By state we consider:
For the control plane we already have manual steps for making backup and restores The missing piece is to have a story for taking a snapshot of Profile CRs and user namespace objects and contents. |
Context
We need to have a story for being able to do reversible upgrades. The goal is that if during an upgrade something goes wrong that we can revert to a state before the upgrade.
There are 2 approaches for this:
Canary upgrades
Blue-Green upgrades
In-Place upgrades
As part of reversible upgrades story we'll need to have a plan on how to approach reversible upgrades.
What needs to get done
Definition of Done
The text was updated successfully, but these errors were encountered: