-
Notifications
You must be signed in to change notification settings - Fork 426
OCPBUGS-55638: OCPBUGS-55637: Add admin_acks handling in the MCO #5027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-4.18
Are you sure you want to change the base?
Conversation
@djoshy: This pull request references Jira Issue OCPBUGS-55638, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
0bbd233
to
eaebaed
Compare
eaebaed
to
910ea66
Compare
910ea66
to
157c091
Compare
/retest-required |
1 similar comment
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally lgtm, will of course need to wait for the docs to exist before we can merge
Some minor questions inline, but I think they're mostly clear
pkg/operator/admin_ack.go
Outdated
BootImageAWSGCPMsg = "GCP or AWS platform with no boot image configuration detected. OCP will automatically opt-in all GCP and AWS clusters that currently do not a boot image configuration in 4.19. Please add a configuration to disable boot image updates if this is not desired. See [insert-doc-link] " | ||
|
||
AArch64BootImageKey = "ack-4.18-aarch64-bootloader-4.19" | ||
AArch64BootImageMsg = "aarch64 nodes detected. Please ensure boot image are updated by following the KCS:(insertlink) prior to upgrading to 4.19." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: updated
sounds like it's an action that always needs to be taken, maybe we should say something like "ensure boot image is not too old" instead? (I couldn't think of a better way to phrase it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense to me! Perhaps we should run the messages through a doc/Trevor review, they might have better insight/worked on this before
} | ||
|
||
// If no machine managers exist, no configuration has been defined | ||
return mcop.Spec.ManagedBootImages.MachineManagers == nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is saying that if the user already has opted in in a previous version, we don't raise the admin ack?
(makes sense, just thinking out loud)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily opt-ed in, but if they have any sort of opinion(all/none/partial), we'll either remove the admin-ack if one exists, or otherwise a no-op
// updateGuardKeyIfNeeded adds a key with a message depending on: | ||
// If guard is needed and key doesn't exist => create it | ||
// If guard is needed and key exists => nothing to do | ||
// If guard is not needed and key exists => delete it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, in the sense that we do clear stale admin acks, but only before you do the upgrade, since these syncs will be gone after 4.18.
(again, this is fine, just thinking out loud)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so I wanted the user to able to clear the gate either by addressing it via the admin-ack
configmap or via defining an boot image opinion. In 4.19, this should not have any effect per Trevor since these keys only target 4.18(but we can double check that again)
adminCM: buildAdminAckConfigMapWithData(nil), | ||
}, | ||
{ | ||
name: "AWS platform, with a boot image configuration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests all have a "disabled" boot image configuration (empty machinemanagers), and thereby do not need the admin ack, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, disabled meaning the user wanted to opt out so no need to raise the alarm. I can add comments/make this more verbose if its not very clear in the current state 😄
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving an approval for the code, but should wait until docs are in place to merge
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
157c091
to
9f5367d
Compare
Updated to include doc links.
|
@djoshy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Pre-merge verified :
Edit the oc edit cm admin-acks -n openshift-config -o yaml apiVersion: v1 data: ack-4.18-kube-1.32-api-removals-in-4.19: "true" kind: ConfigMap metadata: annotations: include.release.openshift.io/hypershift: "true" include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" kubernetes.io/description: Record administrator acknowledgments of update gates defined in the admin-gates ConfigMap in the openshift-config-managed namespace. release.openshift.io/create-only: "true" creationTimestamp: "2025-05-16T09:20:54Z" name: admin-acks namespace: openshift-config resourceVersion: "36986" uid: 167b0343-a43b-4e6d-b948-fa8c5a22cccc Able to see
After edit as far now able to see the |
/label backport-risk-assessed Not an actual backport |
- What I did
I added a new sync loop in the operator for the
admin-gates
configmap in theopenshift-config-managed
namespace. This sync loop will now begin to add the following keys:ack-4.18-boot-image-opt-out-in-4.19
when a AWS/GCP cluster does not have a boot image configuration.ack-4.18-aarch64-bootloader-4.19
when an aarch64/arm64 node is detected in the cluster.These gates are only relevant for upgrades to 4.19, so they will only need to exist in the MCO's
release-4.18
branch. I've also added a few units for this functionality.- How to verify it
Before testing either case, ensure that any non MCO keys in
admin-gates
have been acknowledged. For example, if there is a key calledack-4.18-kube-1.32-api-removals-in-4.19
inadmin-gates
, add this key to theadmin-acks
configmap in theopenshift-config
namespace with its value set to true like so:Upgradeable=False
(check viaoc get clusterversion -o yaml
). Now, add a boot image configuration - the CVO should no longer be settingUpgradeable
to false.Upgradeable=False
(check viaoc get clusterversion -o yaml
).