Skip to content

MCO-1002: Add a flag to allow irreconcilable configs #2244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pablintino
Copy link

The flag will tell the operator to skip the irreconcilable fields validation and let the user update/patch conflictive MCs in the cluster at his own risk. This feature is specially thought to allow users to add new nodes in a long standing cluster with newer configuration.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 20, 2025

@pablintino: This pull request references MCO-1002 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

The flag will tell the operator to skip the irreconcilable fields validation and let the user update/patch conflictive MCs in the cluster at his own risk. This feature is specially thought to allow users to add new nodes in a long standing cluster with newer configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 20, 2025
Copy link
Contributor

openshift-ci bot commented Mar 20, 2025

Hello @pablintino! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 20, 2025
// ignoreIrreconcilableConfig tells the operator to ignore irreconciliable configuration changes
// in already existing nodes. New nodes joining the cluster will see the newest configuration.
// +optional
IgnoreIrreconciliableConfig bool `json:"ignoreIrreconcilableConfig"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use a boolean here. Things generally start as booleans but can end up progressing to need more than a true/false option. Make this an enum instead so that you can add additional options in the future if needed. Using a boolean makes it so you cannot change this to support other options in the future.

For more information on why we encourage enums over booleans see: https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#do-not-use-boolean-fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, this is being added to a v1 API - should this be feature-gated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven I totally agree that, what you propose is nicer and more maintainable. I've updated the PR to use a proper enum with a default value that means "behave like you are doing, aka, validate the configs".
About the feature gate, @yuqi-zhang thought, and I agree with him that this shouldn't require a feature gate, as it's a knob to allow some customers (previously informed about the implications of using the non-default value) to skip the validation of the MCO config for new nodes. I let him reply when he is back from PTO in 3 weeks, as there's no rush with this PR.

@pablintino pablintino force-pushed the irreconcilable-config branch 2 times, most recently from 6fd81f1 to aaf341e Compare March 24, 2025 17:20
// Valid values are Strict and Relaxed:
// Strict: Rejects changes to MachineConfigs if fields that doesn't support to be updated are changed.
// Relaxed: Changes to protected fields are allowed and will be applied in new nodes joining the cluster.
// +kubebuilder:validation:Enum=Strict;Relaxed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create constants for the valid types.

Also, because this is an optional field, "" is a valid value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Comment on lines 68 to 69
// +kubebuilder:validation:Default=Strict
// +default="Strict"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to set an explicit default here? Will you always and forever default to Strict?

A common pattern when it comes to defaulting for enums in configuration type APIs is to have the system choose the default. This allows us to change it as we see fit, where setting an explicit default does not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what you propose. I've added the empty string as an option and I'll code the MCO to consider the empty string as a valid input.

@@ -56,6 +58,17 @@ type MachineConfigurationSpec struct {
// +openshift:enable:FeatureGate=NodeDisruptionPolicy
// +optional
NodeDisruptionPolicy NodeDisruptionPolicyConfig `json:"nodeDisruptionPolicy"`


// configurationValidationPolicy tells the operator how new machine configurations should be validated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// configurationValidationPolicy tells the operator how new machine configurations should be validated.
// configurationValidationPolicy is an optional field that allows configuring the level of validation performed on new machine configurations.

Is this only done on new machine configurations or does this also apply to changes to existing machine configurations?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It applies only to new changes. Basically, what the MCO does is to gather all the machine configurations for each pool of nodes, merge them into a single one, called rendered, and perform the validations in the rendered.
This toggle allows the user to skip checking if there are changes between the "current" and the "new" rendered MCs. Sounds weird, but there's an explanation and some use cases the justifies this need. The main one is a customer that deployed the cluster using X filesystem configuration 4 years ago. Now, they want to add new nodes but that X FS config is no longer valid cause the HW is different. With our current approach we will validate the new rendered MC and we will reject it since changes in the FS sections are not allowed. With this option, the customer, will ack that we are not making validations and that their configs may be problematic, but they will be able to deploy and scenario like the one I described. New nodes will take the latest MC and old nodes will just ignore the changes.

// +kubebuilder:validation:Default=Strict
// +default="Strict"
// +optional
ConfigurationValidationPolicy MachineConfigurationValidationPolicy `json:"configurationValidationPolicy,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is MachineConfigurationValidationPolicy more descriptive for a user as to what this applies to?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!



// configurationValidationPolicy tells the operator how new machine configurations should be validated.
// Valid values are Strict and Relaxed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally follow this format for stating allowed enum values to try and keep consistent across OCP APIs:

Suggested change
// Valid values are Strict and Relaxed:
// Allowed values are Strict, Relaxed, and omitted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, add a description for what happens when leaving this omitted, like:

  • If you are using a system chosen default (i.e the operator chooses):

    When omitted, this means no-opinion and the system is left to choose a default. Currently the default is {default}.

  • If you are using an explicit default (i.e defaulted on admission):

    When omitted, defaults to {default}.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for both comments. I think the best is whay you proposed in another comment, let the system decide which option is the default one.


// configurationValidationPolicy tells the operator how new machine configurations should be validated.
// Valid values are Strict and Relaxed:
// Strict: Rejects changes to MachineConfigs if fields that doesn't support to be updated are changed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally follow this format on OpenShift APIs when talking about what happens when setting a specific allowed value:

Suggested change
// Strict: Rejects changes to MachineConfigs if fields that doesn't support to be updated are changed.
// When set to Strict, changes to MachineConfigs fields that doesn't support to be updated are rejected.

What does "doesn't support to be updated" mean? are these the "protected" fields you mention in the description of the Relaxed setting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a bit more information in the new patch, including a link to our docs.
I think the new comment in the new patch may express what I mean a bit better.

// configurationValidationPolicy tells the operator how new machine configurations should be validated.
// Valid values are Strict and Relaxed:
// Strict: Rejects changes to MachineConfigs if fields that doesn't support to be updated are changed.
// Relaxed: Changes to protected fields are allowed and will be applied in new nodes joining the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Relaxed: Changes to protected fields are allowed and will be applied in new nodes joining the cluster.
// When set to Relaxed, changes to protected fields are allowed and will be applied in new nodes joining the cluster.

@pablintino pablintino closed this Apr 7, 2025
@pablintino pablintino force-pushed the irreconcilable-config branch from aaf341e to 5655594 Compare April 7, 2025 12:00
@openshift-ci openshift-ci bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 7, 2025
@pablintino pablintino reopened this Apr 7, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 7, 2025

@pablintino: This pull request references MCO-1002 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

The flag will tell the operator to skip the irreconcilable fields validation and let the user update/patch conflictive MCs in the cluster at his own risk. This feature is specially thought to allow users to add new nodes in a long standing cluster with newer configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 7, 2025
@pablintino
Copy link
Author

@everettraven Thanks for your inputs Bryce. I've updated the patch to match your input.

Copy link
Contributor

openshift-ci bot commented Apr 7, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pablintino
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
Copy link
Contributor

openshift-ci bot commented Apr 7, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pablintino
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoelSpeed
Copy link
Contributor

Is there an appropriate enhancement that describes how this is going to work? And in particular I'm concerned about how we will support this?

Is this field supportable?

This will have to go behind a feature gate at first as well

@pablintino pablintino force-pushed the irreconcilable-config branch from 3b81d2a to ef71013 Compare April 30, 2025 13:29
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2025
@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 30, 2025
@pablintino
Copy link
Author

Is there an appropriate enhancement that describes how this is going to work? And in particular I'm concerned about how we will support this?

Is this field supportable?

This will have to go behind a feature gate at first as well

Hi @JoelSpeed. I've updated the patch, adding a feature gate and referencing to the, still draft, enhancement I created.

@pablintino pablintino force-pushed the irreconcilable-config branch from ef71013 to faf46db Compare April 30, 2025 14:28
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2025
The flag will tell the operator to skip the irreconcilable fields
validation and let the user update/patch conflictive MCs in the cluster
at his own risk. This feature is specially thought to allow users to add
new nodes in a long standing cluster with newer configuration.
@pablintino pablintino force-pushed the irreconcilable-config branch from faf46db to 5fb3a30 Compare April 30, 2025 14:32
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

openshift-ci bot commented May 15, 2025

@pablintino: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-crd-schema 5fb3a30 link true /test verify-crd-schema
ci/prow/e2e-aws-ovn-hypershift 5fb3a30 link true /test e2e-aws-ovn-hypershift
ci/prow/verify 5fb3a30 link true /test verify
ci/prow/okd-scos-e2e-aws-ovn 5fb3a30 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-serial-1of2 5fb3a30 link true /test e2e-aws-serial-1of2
ci/prow/e2e-aws-serial-techpreview-1of2 5fb3a30 link true /test e2e-aws-serial-techpreview-1of2
ci/prow/e2e-aws-serial-techpreview-2of2 5fb3a30 link true /test e2e-aws-serial-techpreview-2of2
ci/prow/e2e-aws-serial-2of2 5fb3a30 link true /test e2e-aws-serial-2of2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants