Skip to content

MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency #4996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dkhater-redhat
Copy link
Contributor

- What I did

- How to verify it

- Description for the changelog

@dkhater-redhat dkhater-redhat marked this pull request as draft April 22, 2025 15:37
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 22, 2025
Copy link
Contributor

openshift-ci bot commented Apr 22, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkhater-redhat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 22, 2025
@dkhater-redhat dkhater-redhat changed the title Change mcdiffb DRAFT MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency Apr 23, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 23, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 23, 2025

@dkhater-redhat: This pull request references MCO-1580 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dkhater-redhat dkhater-redhat force-pushed the change-mcdiffb branch 7 times, most recently from 8452b7a to 32555ad Compare April 28, 2025 17:22
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 28, 2025
@dkhater-redhat dkhater-redhat force-pushed the change-mcdiffb branch 3 times, most recently from ef4dc03 to 621dc6c Compare April 28, 2025 23:16
@dkhater-redhat dkhater-redhat marked this pull request as ready for review April 28, 2025 23:51
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2025
@dkhater-redhat dkhater-redhat changed the title DRAFT MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency Apr 28, 2025
@dkhater-redhat dkhater-redhat force-pushed the change-mcdiffb branch 2 times, most recently from 633afa0 to db06f06 Compare April 29, 2025 15:16
@dkhater-redhat dkhater-redhat force-pushed the change-mcdiffb branch 3 times, most recently from 5281786 to 3347699 Compare April 29, 2025 15:29
…g and reusing MOSB's with updated rendered spec
Copy link
Contributor

openshift-ci bot commented Apr 29, 2025

@dkhater-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade-out-of-change 0c73a25 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/okd-scos-e2e-aws-ovn 0c73a25 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-gcp-op-ocl 0c73a25 link false /test e2e-gcp-op-ocl

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

return err
}

if (oldRendered != newRendered && needsImageRebuild) || firstOptIn == "" {
Copy link
Contributor

@umohnani8 umohnani8 May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be a case where the rendered MC hasn't changed but a rebuild is still needed - for example when the rebuild annotation is applied. I think this should be an OR check instead of AND.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah there is a flaw with this. If I do not specify that check currently, then we will "reuse an MOSB" the first time we opt into OCL. This needs to be worked out. I forget if that firstOptIn signal actually works lol

// but populates its status from oldMosb so that no build actually runs.
func (b *buildReconciler) reuseImageForNewMOSB(ctx context.Context, mosc *mcfgv1.MachineOSConfig, oldMosb *mcfgv1.MachineOSBuild,
) error {
// Look up the MCP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to add a check here to verify that the image still exists in the registry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should still exist? is there a scenario that the MOSB would cite an image but it doesn't exist in the registry? Becuase we are reusing the image from the existing build onto the new MOSB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the user deletes the image from the registry, the MOSB has no idea that it is gone. There was a bug about this, so we added a skopeo inspect check to ensure that the image still exists before continuing. If it doesn't, we will rebuild the MOSB again and this is the other scenario in which a rebuild with the same name happens. These are the PRs that fixed it #4975 and #4807

}

klog.Info("getting nodes for pool")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the pool name that we are getting the nodes for.

Same for the other info logs below, but maybe those were for your own debugs and have been left behind

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to clean those logs up at the end. this was for debugging purposes

@@ -676,9 +676,9 @@ func (dn *Daemon) initializeNode() error {
//nolint:gocyclo
func (dn *Daemon) syncNode(key string) error {
startTime := time.Now()
klog.V(4).Infof("Started syncing node %q (%v)", key, startTime)
klog.Infof("Started syncing node %q (%v)", key, startTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to change the level of this log? Same for the one right below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change back when PR looks good, this was for debugging purposes.

if err != nil {
return err
}
newMosb.SetOwnerReferences([]metav1.OwnerReference{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (non-blocking): We may want to push the SetOwnerReferences() part into the NewMachineOSBuild() constructor since we already have the MOSC there and it feels like something the MOSB constructor should be setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this got a little hacky because I was trying to reuse the preexisting functions to reuse the MOSB but it would feed into creating a new job regardless. ill look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants