@@ -698,133 +698,29 @@ The deprecation strategy is described in the OVN-Kubernetes
698
698
699
699
## Upgrade / Downgrade Strategy
700
700
701
- If applicable, how will the component be upgraded and downgraded? Make sure this
702
- is in the test plan.
703
-
704
- Consider the following in developing an upgrade/downgrade strategy for this
705
- enhancement:
706
- - What changes (in invocations, configurations, API use, etc.) is an existing
707
- cluster required to make on upgrade in order to keep previous behavior?
708
- - What changes (in invocations, configurations, API use, etc.) is an existing
709
- cluster required to make on upgrade in order to make use of the enhancement?
710
-
711
- Upgrade expectations:
712
- - Each component should remain available for user requests and
713
- workloads during upgrades. Ensure the components leverage best practices in handling [ voluntary
714
- disruption] ( https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ ) . Any exception to
715
- this should be identified and discussed here.
716
- - Micro version upgrades - users should be able to skip forward versions within a
717
- minor release stream without being required to pass through intermediate
718
- versions - i.e. ` x.y.N->x.y.N+2 ` should work without requiring ` x.y.N->x.y.N+1 `
719
- as an intermediate step.
720
- - Minor version upgrades - you only need to support ` x.N->x.N+1 ` upgrade
721
- steps. So, for example, it is acceptable to require a user running 4.3 to
722
- upgrade to 4.5 with a ` 4.3->4.4 ` step followed by a ` 4.4->4.5 ` step.
723
- - While an upgrade is in progress, new component versions should
724
- continue to operate correctly in concert with older component
725
- versions (aka "version skew"). For example, if a node is down, and
726
- an operator is rolling out a daemonset, the old and new daemonset
727
- pods must continue to work correctly even while the cluster remains
728
- in this partially upgraded state for some time.
729
-
730
- Downgrade expectations:
731
- - If an ` N->N+1 ` upgrade fails mid-way through, or if the ` N+1 ` cluster is
732
- misbehaving, it should be possible for the user to rollback to ` N ` . It is
733
- acceptable to require some documented manual steps in order to fully restore
734
- the downgraded cluster to its previous state. Examples of acceptable steps
735
- include:
736
- - Deleting any CVO-managed resources added by the new version. The
737
- CVO does not currently delete resources that no longer exist in
738
- the target version.
701
+ N/A
739
702
740
703
## Version Skew Strategy
741
704
742
705
N/A
743
706
744
707
## Operational Aspects of API Extensions
745
708
746
- Describe the impact of API extensions (mentioned in the proposal section, i.e. CRDs,
747
- admission and conversion webhooks, aggregated API servers, finalizers) here in detail,
748
- especially how they impact the OCP system architecture and operational aspects.
749
-
750
- - For conversion/admission webhooks and aggregated apiservers: what are the SLIs (Service Level
751
- Indicators) an administrator or support can use to determine the health of the API extensions
752
-
753
- Examples (metrics, alerts, operator conditions)
754
- - authentication-operator condition ` APIServerDegraded=False `
755
- - authentication-operator condition ` APIServerAvailable=True `
756
- - openshift-authentication/oauth-apiserver deployment and pods health
757
-
758
- - What impact do these API extensions have on existing SLIs (e.g. scalability, API throughput,
759
- API availability)
709
+ The proposed ` IPPool ` CRD must be provisioned by the admin (or the source
710
+ cluster introspection tool) before the VMs are migrated into OpenShift virt,
711
+ otherwise, they will lose the IP addresses they had on the source cluster.
760
712
761
- Examples:
762
- - Adds 1s to every pod update in the system, slowing down pod scheduling by 5s on average.
763
- - Fails creation of ConfigMap in the system when the webhook is not available.
764
- - Adds a dependency on the SDN service network for all resources, risking API availability in case
765
- of SDN issues.
766
- - Expected use-cases require less than 1000 instances of the CRD, not impacting
767
- general API throughput.
713
+ The gateway for the network must be configured in the cluster UDN CR at
714
+ creation time, as any other cluster UDN parameter.
768
715
769
- - How is the impact on existing SLIs to be measured and when (e.g. every release by QE, or
770
- automatically in CI) and by whom (e.g. perf team; name the responsible person and let them review
771
- this enhancement)
772
-
773
- - Describe the possible failure modes of the API extensions.
774
- - Describe how a failure or behaviour of the extension will impact the overall cluster health
775
- (e.g. which kube-controller-manager functionality will stop working), especially regarding
776
- stability, availability, performance and security.
777
- - Describe which OCP teams are likely to be called upon in case of escalation with one of the failure modes
778
- and add them as reviewers to this enhancement.
716
+ Hence, some planning and preparation are required from the admin before the
717
+ VM owner starts importing VMs into the OpenShift Virt cluster via MTV.
779
718
780
719
## Support Procedures
781
720
782
- Describe how to
783
- - detect the failure modes in a support situation, describe possible symptoms (events, metrics,
784
- alerts, which log output in which component)
785
-
786
- Examples:
787
- - If the webhook is not running, kube-apiserver logs will show errors like "failed to call admission webhook xyz".
788
- - Operator X will degrade with message "Failed to launch webhook server" and reason "WehhookServerFailed".
789
- - The metric ` webhook_admission_duration_seconds("openpolicyagent-admission", "mutating", "put", "false") `
790
- will show >1s latency and alert ` WebhookAdmissionLatencyHigh ` will fire.
791
-
792
- - disable the API extension (e.g. remove MutatingWebhookConfiguration ` xyz ` , remove APIService ` foo ` )
793
-
794
- - What consequences does it have on the cluster health?
795
-
796
- Examples:
797
- - Garbage collection in kube-controller-manager will stop working.
798
- - Quota will be wrongly computed.
799
- - Disabling/removing the CRD is not possible without removing the CR instances. Customer will lose data.
800
- Disabling the conversion webhook will break garbage collection.
801
-
802
- - What consequences does it have on existing, running workloads?
803
-
804
- Examples:
805
- - New namespaces won't get the finalizer "xyz" and hence might leak resource X
806
- when deleted.
807
- - SDN pod-to-pod routing will stop updating, potentially breaking pod-to-pod
808
- communication after some minutes.
809
-
810
- - What consequences does it have for newly created workloads?
811
-
812
- Examples:
813
- - New pods in namespace with Istio support will not get sidecars injected, breaking
814
- their networking.
815
-
816
- - Does functionality fail gracefully and will work resume when re-enabled without risking
817
- consistency?
818
-
819
- Examples:
820
- - The mutating admission webhook "xyz" has FailPolicy=Ignore and hence
821
- will not block the creation or updates on objects when it fails. When the
822
- webhook comes back online, there is a controller reconciling all objects, applying
823
- labels that were not applied during admission webhook downtime.
824
- - Namespaces deletion will not delete all objects in etcd, leading to zombie
825
- objects when another namespace with the same name is created.
721
+ TODO
826
722
827
723
## Infrastructure Needed [ optional]
828
724
829
- Use this section if you need things from the project. Examples include a new
830
- subproject, repos requested, github details, and/or testing infrastructure .
725
+ We'll need a virt-aware lane with CNV (and MTV) installed so we can e2e test
726
+ the features .
0 commit comments