Skip to content

[release-4.19] OCPBUGS-56180: Add hot loop detection in the boot image controller #5050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

openshift-cherrypick-robot

This is an automated cherry-pick of #5037

/assign djoshy

@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-55967 has been cloned as Jira Issue OCPBUGS-56180. Will retitle bug to link to clone.
/retitle [release-4.19] OCPBUGS-56180: Add hot loop detection in the boot image controller

In response to this:

This is an automated cherry-pick of #5037

/assign djoshy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot changed the title [release-4.19] OCPBUGS-55967: Add hot loop detection in the boot image controller [release-4.19] OCPBUGS-56180: Add hot loop detection in the boot image controller May 14, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 14, 2025
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-56180, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected dependent Jira Issue OCPBUGS-55967 to be in one of the following states: MODIFIED, ON_QA, VERIFIED, but it is POST instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is an automated cherry-pick of #5037

/assign djoshy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from pablintino and umohnani8 May 14, 2025 11:22
Copy link
Contributor

openshift-ci bot commented May 14, 2025

@openshift-cherrypick-robot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade-out-of-change 6afdfa1 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op-ocl 6afdfa1 link false /test e2e-gcp-op-ocl

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dkhater-redhat
Copy link
Contributor

/lgtm
/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label May 14, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2025
Copy link
Contributor

openshift-ci bot commented May 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkhater-redhat, openshift-cherrypick-robot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2025
@ptalgulk01
Copy link

Pre-merge verification:

Verified using IPI based AWS based 4.19 cluster.

  1. Get the bootImage value before edit
  2. Manually change the boot image for a MachineSet for more than 3 times:
  • On GCP: Change the disk.image in the providerSpec.
  • On AWS: Change the ami.id in the providerSpec.
$ oc edit machinesets.machine.openshift.io  -n openshift-machine-api ci-ln-gkchc1t-76ef8-4dhkt-worker-us-east-2a -o yaml
....
       spec:
       ...
          ami:
            id: ami-fake
  1. At first few time the value is changed to original. Later able to see the edited value with machine-config operator been degraded.
$ oc get co machine-config 
NAME             VERSION                                                AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config                             4.19.0-0-2025-05-15-052029-test-ci-ln-gkchc1t-latest   True        False         True       112m    Failed to resync 4.19.0-0-2025-05-15-052029-test-ci-ln-gkchc1t-latest because: bootimage update failed: 1 Degraded MAPI MachineSets | 0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments | Error(s): error syncing MAPI MachineSet ci-ln-gkchc1t-76ef8-4dhkt-worker-us-east-2a: refusing to reconcile machineset ci-ln-gkchc1t-76ef8-4dhkt-worker-us-east-2a, hot loop detected. Please opt-out of boot image updates, adjust your machine provisioning workflow to prevent hot loops and opt back in to resume boot image updates

$ oc  get machineconfigurations -o yaml
    - lastTransitionTime: "2025-05-15T07:34:23Z"
      message: '1 Degraded MAPI MachineSets | 0 Degraded CAPI MachineSets | 0 CAPI
        MachineDeployments | Error(s): error syncing MAPI MachineSet ci-ln-gkchc1t-76ef8-4dhkt-worker-us-east-2a:
        refusing to reconcile machineset ci-ln-gkchc1t-76ef8-4dhkt-worker-us-east-2a,
        hot loop detected. Please opt-out of boot image updates, adjust your machine
        provisioning workflow to prevent hot loops and opt back in to resume boot
        image updates'
      reason: MAPIMachinesetUpdated
      status: "True"
      type: BootImageUpdateDegraded

  1. Opt-out of boot-mage by editing the machieconfiguration
....
  spec:
    logLevel: Normal
    managedBootImages:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: machinesets
        selection:
          mode: None
    managementState: Managed
    operatorLogLevel: Normal
  status:
....
    managedBootImagesStatus:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: machinesets
        selection:
          mode: None
  1. OR Edit the booImage value again with the original value
    /label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label May 15, 2025
@djoshy
Copy link
Contributor

djoshy commented May 15, 2025

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-56180, which is invalid:

  • expected dependent Jira Issue OCPBUGS-55967 to be in one of the following states: MODIFIED, ON_QA, VERIFIED, but it is POST instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor

djoshy commented May 16, 2025

/cherry-pick release-4.18, release-4.17

@openshift-cherrypick-robot
Copy link
Author

@djoshy: once the present PR merges, I will cherry-pick it on top of release-4.18, in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.18, release-4.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@djoshy
Copy link
Contributor

djoshy commented May 16, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label May 16, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-56180, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note type set to "Release Note Not Required"
  • dependent bug Jira Issue OCPBUGS-55967 is in the state Verified, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-55967 targets the "4.20.0" version, which is one of the valid target versions: 4.20.0
  • bug has dependents

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label May 16, 2025
@openshift-ci openshift-ci bot requested a review from sergiordlr May 16, 2025 13:30
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 6682dd1 and 2 for PR HEAD 6afdfa1 in total

@openshift-merge-bot openshift-merge-bot bot merged commit aae92b4 into openshift:release-4.19 May 16, 2025
19 of 21 checks passed
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-56180: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-56180 has been moved to the MODIFIED state.

In response to this:

This is an automated cherry-pick of #5037

/assign djoshy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot
Copy link
Author

@djoshy: cannot checkout release-4.18,: error checking out "release-4.18,": exit status 1 error: pathspec 'release-4.18,' did not match any file(s) known to git

In response to this:

/cherry-pick release-4.18, release-4.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@djoshy
Copy link
Contributor

djoshy commented May 16, 2025

/cherry-pick release-4.18 release-4.17

@openshift-cherrypick-robot
Copy link
Author

@djoshy: new pull request created: #5062

In response to this:

/cherry-pick release-4.18 release-4.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator
This PR has been included in build ose-machine-config-operator-container-v4.19.0-202505161941.p0.gaae92b4.assembly.stream.el9.
All builds following this will include this PR.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.19.0-0.nightly-2025-05-17-005111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.