MCO-1676: MachineConfig Non-Reconcilable Changes #1785

pablintino · 2025-04-30T12:50:42Z

This enhancement adds the feature to allow skipping non-reconcilable-changes validation in the MCO.

openshift-ci-robot · 2025-04-30T12:50:46Z

@pablintino: This pull request references MCO-1002 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

This enhancement adds the feature to allow skipping non-reconcilable-changes validation in the MCO.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-04-30T12:50:46Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-04-30T12:50:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign coverprice for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yuqi-zhang

Generally makes sense to me! Some comments inline

enhancements/machine-config/machine-config-non-reconcilable-changes.md

yuqi-zhang · 2025-04-30T21:38:06Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+why it will need to be skip under certain, specific, circumstances. There are 
+known customer use-cases that requires making non-reconcilable MC changes.
+
+## Motivation


For the motivation section, I think we can talk a bit more about the limitations today in terms of use cases. Maybe a snippet after the non-reconcileable changes like:

Once you configure install-time parameters, the MCO will prevent any further changes to non-reconcileable fields. While this is generally useful for safety, it becomes problematic for any users who wishes to change install-time only parameters, such as disk partition schema. In the worst case, this would prevent scaleup of new nodes with any differences incompatible with existing MachineConfigs. For these users, their only real option today would be to reprovision their cluster with new install-time configuration, which is costly and time consuming. We would like to introduce the ability for users in these scenarios to instruct the MCO to allow for unreconcileable MachineConfig changes to be applied, skipping existing nodes, and serve this configuration to new nodes joining the cluster. Invalid ignition is not considered in this case.

Just a suggestion, feel free to adapt this. This way we can shorten the motivation section a bit and be a bit more clear that the issue isn't necessarily allowing non-supported changes, but to work around them with user acknowledgement.

A good suggestion! I have used it directly, cause, to be honest, I do not think I have a better way to express it. To land readers (influenced by other comments in the PR), I added a small introduction sentence to state that Ignition is the one doind the initial one-off config.

yuqi-zhang · 2025-04-30T21:38:41Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+* As a cluster admin, I am adding new nodes to a long-standing cluster and I 
+would like change the partitions for the new nodes.
+* As a cluster admin, I am adding new nodes to a long-standing cluster and the 


kernel argument changes are generally allowed. Perhaps we can use another motivation here

Got it! Changed!

enhancements/machine-config/machine-config-non-reconcilable-changes.md

yuqi-zhang · 2025-04-30T21:40:15Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+## Proposal
+
+Update the MachineConfiguration CR to add a new field in the spec to give 
+users the ability to disable MCs validation. The field will default to the 


Instead of disabling MC validation, maybe we should phrase it as "allow irreconcileable MachineConfig changes to bypass validation"

Nice. I've rewritten the whole sentence to make it a bit shorter are readable including your changes.

yuqi-zhang · 2025-04-30T21:42:10Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+The MachineConfig Controller and the MachingConfig MachineConfigDaemons will 
+read in runtime the new field and if the value explicitely states that the 
+validation should be skipped they will let the MC pass and get applied to 


We should clarify that the MCD would not actually perform a node update for any irreconcileable fields that were changed, but instead just perform regularly supported fields, and only update the current node annotations and report the node as updated if all the changes were irreconcileable originally.

enhancements/machine-config/machine-config-non-reconcilable-changes.md

yuqi-zhang · 2025-04-30T21:46:54Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+validation should be skipped they will let the MC pass and get applied to 
+the nodes.
+
+### Workflow Description


We should talk about how this config would apply to existing nodes. I think we'd still update their annotations, but have the daemon recognize it's a no-op update. WDYT?

Good one that I didn't think about.
If I did my research job properly the ones we will look/update are the ones this code updates, like machineconfiguration.openshift.io/state, machineconfiguration.openshift.io/currentConfig and machineconfiguration.openshift.io/desiredConfig isn't it?

I've updated explicetly state what happens with the state and the current config annotations.

I've got some more questions to try and get a bit more clarity here:

Would updating annotations for existing nodes potentially send a false-positive signal to users that a node has been updated to reflect their desired changes?

If you are able to have the daemon recognize no-op updates, why do you need to have these validations in the first place?

How are failures of these validations communicated to users today?

Could we leverage an existing communication layer to inform users that the changes they are making will only be applied to new nodes and ignored for existing nodes?

I think the first point may be related and answered by answer of Wouldn't this be misleading to users if we update this annotation on existing nodes that aren't actually configured using the "updated" config?.
For the second point: The validations are done before the MCD enters in the scene, in our MCO controller, and of if the MCs passes the validation in the controller, the node "desired" annotation is updated and the MCD starts the update process.
About the reporting of the errors: We report the irreconcilable error (first field to violate the rules) in the MCP CR by adding the RenderDegraded condition to the status. The condition message has the details of which field is failing making the validation fail.
About the last point: I have the feeling that giving that information will add more confusion, as it may make the user think that new configs cannot target "old" nodes. What may help is a new condition in the MCN or annotation (I do not know which one will be better) that indicates if the update was a no-op or not.

For the second point: The validations are done before the MCD enters in the scene, in our MCO controller, and of if the MCs passes the validation in the controller, the node "desired" annotation is updated and the MCD starts the update process.

What would happen if the MCO just didn't do these validations? Why were these validations important in the first place?

Bit of a longer answer to try to provide background for this (and maybe some other) questions:
When we initially merged Atomic Host and Container Linux to create RHCOS for OCP4, we decided to use ignition (the spec) as the primary way a user defines their system configuration, encapsulated as part of the "MachineConfig" object (which really just started as ignition + an osImage field).

The original design for ignition (as a binary) was that it was to run as part of the initramfs on the very first boot of the CL system, and as such had a lot of freedom to configure whatever it wanted. The spec mostly was designed to support those operations. When we moved that over to OCP, we still ran ignition in the same way in RHCOS, but also added ignition to the MachineConfig spec, and we had a subset of "post-install" fields that we believe we could manage to keep the cluster updated. The MCO would only allow those, which morphed into the validations you see today. (side note: if we were to do this today, we'd probably have this all with API validation, but that wasn't done initially during 4.0 so those legacy fields stayed relatively loose in terms of what you can and can't apply)

This worked fine for the most part, since changing any field that doesn't pass those validations was not very common during the early days. But as OCP 4 grew, and users had older and older clusters, many wanted to introduce new features, such as new hardware, new encryption, etc. We could eventually update the MCO to support some of these operations, but a few critical paths (such as disk partitions) we don't want to do to live systems, and so there became more of a need to allow new nodes to join with different configuration than existing ones. (side note 2: we do technically support doing this, via ignition directly, which is embedded into secrets in the MAPI namespace. However that becomes unmanaged state so we prefer to have users do it through the MCO directly)

So those validations are still important: we don't want the user to change fields we can't support, but we also want a way for the users to acknowledge that they need firstboot config served by the MCServer that differs from the configs on existing nodes. This current proposal is the method the MCO team decided was the most straightforward way to get them going, but if we believe necessary, we can create some alternatives, but those may require API or architecture changes to how we manage configuration today.

everettraven · 2025-05-01T18:09:13Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+* Add a knob in MCO's MachineConfiguration CR to skip non-reconcilable fields 
+validation if necessary.


It sounds a bit more like the goal here is to allow adding new nodes with a different machine configuration than existing nodes. Is this correct?

The current goal here sounds more prescriptive of a how as opposed to a what.

Agree, I have reworked it.

everettraven · 2025-05-01T18:20:56Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+By setting `machineConfigurationValidationPolicy` to `Relaxed` the customer 
+acknowledges that providing MCs that make use of Ignition features out of the 
+scope of the MCO will lead to cluster with nodes using different Ignition 
+configurations.


What happens to a customer's cluster in this scenario? Can this have negative consequences? How severe are they?

In theory, the customer could apply configuration that is valid but not directly supported by the MCO, and think that the MCO applied it to existing nodes since we do not error. In practice, I've not seen anyone do that, and the flag (which will always be non-relaxed by default) is a way for them to acknowledge that their valid, but unsupported-for-existing-node configuration can be used.

Is there some way that users could be signaled as to what nodes have had the configuration applied to them?

@everettraven Nowadays not, but, given that we now have MCNs we can use that to signal users "hey, the configuration didn't change anything" as I state here

everettraven · 2025-05-01T18:24:43Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+    - Relaxed: The validation of non reconcilable fields is skipped and only 
+    the Ignition syntactic validation will be done. 
+
+### Risks and Mitigations


While it is called out as a non-goal for allowing invalid Ignition/MachineConfig instances, how do you plan to prevent this? Is it possible for this proposed field to be leveraged in a what that could cause this?

The MCO always validates all ignition and machineconfigs, and we do not plan to ever support invalid configuration. Invalid in this case means a mistake in the provided spec.

enhancements/machine-config/machine-config-non-reconcilable-changes.md

everettraven · 2025-05-01T18:30:11Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+## Alternatives (Not Implemented)
+
+Not applicable.


I'm pretty unfamiliar with the MCO, but is there a way to specify that a machine configuration should only apply to a subset of nodes? Any new nodes that match the subset are configured using that machine configuration.

Why is ignoring a specific part of validation the only way to achieve this?

Are users wanting to be able to configure nodes in a way that MCO just doesn't support? Why can't/shouldn't MCO be updated to support these configurations that our customers are asking for?

but is there a way to specify that a machine configuration should only apply to a subset of nodes? Any new nodes that match the subset are configured using that machine configuration.

We do have custom pools, but that's not the main concern which we are trying to address here.

Why is ignoring a specific part of validation the only way to achieve this?

This is not the only way to achieve this, but a straightforward and relatively safe way to do so in the context of the MCO. Happy to discuss alternatives.

Are users wanting to be able to configure nodes in a way that MCO just doesn't support? Why can't/shouldn't MCO be updated to support these configurations that our customers are asking for?

Good question. A lot of the configurations that fall under this bucket (in practice) are things we generally want to do once during bootstrap, and not touch it again. While the MCO could potentially implement e.g. repartitioning of disks on existing, running nodes, it's not something we're planning to support, and these configurations will remain bootstrap-time only.

We do have custom pools, but that's not the main concern which we are trying to address here.

Reading through this EP, as someone who is fairly familiar with MCO, this is where my mind went. Why aren't we solving being able to boot into custom MCPs?

If a user could boot directly into a custom MCP, then they would be able to take everything from worker, create worker-new, and then migrate, through creation of new machines, their entire cluster over to the new configuration.

It would avoid having to have exceptions, it would avoid the "we say we are on currentConfig A, but actually that's not true" awkwardness of this current design

I believe it would probably also solve some other RFEs, I'm sure the question of "can be boot directly into a non-worker MCP" has come up several times in my tenure at RH

I would expect if this has been discussed, and dismissed as a viable alternative, then that would be detailed in this section to explain why this is either not a valid solution, or, why the proposed solution is better.

This is not the only way to achieve this, but a straightforward and relatively safe way to do so in the context of the MCO. Happy to discuss alternatives.

It would be helpful to detail the other ways you're aware of here, so that we can understand why the proposed solution is the best solution

A lot of the configurations that fall under this bucket (in practice) are things we generally want to do once during bootstrap, and not touch it again

While I realise that this is something customers can already achieve, so dropping support isn't possible. Would it have been better/would it be better in the future if we steered users, for the irreconcilable stuff, to drive that config through stub ignitions that they pass to MAPI (or equivalent) rather than driving them through the MCO?

If I set kernel arguments in the user-data-secret in MAPI today, would the MCO's served ignition overwrite that configuration, or would my customisations take precedence?

I have a rough plan for "node specific configuration" to be generated into a stub ignition in the future, so potentially these kinds of boot time configs could go into that, and that could become the preferred way to set these in the future?

everettraven · 2025-05-02T12:06:05Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+After all the changes are applied, the MCD updates the Node annotations to 
+point `machineconfiguration.openshift.io/currentConfig` to the updated one and 


Wouldn't this be misleading to users if we update this annotation on existing nodes that aren't actually configured using the "updated" config?

Well, the configuration, from the MCO point of view is applied. Every field supported by the MCO will be updated in "old" nodes too, the key is that configuration that Ignition uses won't never be updated and it's not expected to be supported. For changes that only contains non-supported fields (aka, the ones that Ignition uses at installation time) the MCD will not see anything to update and will just set the current config and the state.
Keep in mind that is possible for a user to define a config with a mix of both scenarios and the behavior will be the same, the MCD will just update what it knows.

What I care about here is the point of view of the users. I'm not sure it really matters if the MCO views the configuration as having been applied if users don't share that same view point. Is the semantic here well documented such that users won't be confused in this situation?

@everettraven So, keep in mind this is just a backdoor customers to skip the validation under very specific circumstances. Given that, and given we will need to document this feature as a "use it with care, these are the implications and the use cases" One thing we can do make everything a bit more verbose is to add a few fields in our MCN CRs statuses to let know the user how many configurations have we skipped and if the current applied one changed something/was skipped.
I must agree this feature needs proper, good written, docs that tells users what to expect and how exceptional the usage of this feature should be.

What are some of those implications? Does that include a user's cluster being marked as entering an unsupported state? How would a user, once entering that unsupported state, be able to roll back to a supported state?

How could you report the differences between the applied configuration and the currentConfig so that a user would be able to check a specific node to see if there are any legacy configurations that have not been updated?

The user story I'm thinking about here is, if I make a change, I want to make sure every node gets that change, it's important for example that every disk now has a new Passwd.

If MCO updates the currentConfig for every node, how do I now identify the nodes which have the old value for Passwd so that I can go and recreate them?

Maybe I did not pick the proper wording and did not explain how what we have now works and the small change this features introduces, but let me try to explain it based on both answers.
First, we do not manage and will not manage what we had/have applied to a node in such atomic way. We only know (and will know) if a whole rendered MC (the result of merging all the MCs for a given pool) was applied or not, and only one, you cannot "enqueue" many changes. This means:

All nodes have a current configuration ID, pointing tho the latest successful rendered MC that was applied

If a pool is rolling out an update, all nodes under that pool umbrella will point their desired MC to the rendered MC of the update. The current ID won't change till the update succeeds. If the update fails, ie, by a validation, the desired ID won't be updated to match the current ID.

If a pool reaches the degraded state, because of i.e. an invalid config, the next change to the MCs of the pool won't add an "update on top of the previous one", the new rendered will replace the desired one and the MCDs will try to apply that one instead, leaving the previous infructuous one stale.

So, to your points @JoelSpeed. The user has the warranty that all the fields the MCO (and the MCD in particular) manages will always be updated, no exception. This proposed flag allows users to tell the MCO "I know I'm adding changes not supported by the MCD to my Ignition, but I'm ok with the MCD ignoring them". So, for example, if the user sets the Passwd, that is supported by the MCD, the he has the warranty that if the update succeeds the Passwd will be updated in all the nodes of the pool, but, if he, for example, added a partition, nothing will happen. In case the user mixes "supported" and "unsupported" changes, all the supported ones will be applied as usual, but the "unsupported" ones will be a no-op for the current nodes and only new nodes joining the cluster will see them, as their Ignition boot will take and apply them.

We only know (and will know) if a whole rendered MC (the result of merging all the MCs for a given pool) was applied or not, and only one, you cannot "enqueue" many changes.

This sounds like a valuable thing for the system to be able to assert. As far as I can tell, the current proposal drops this assertion doesn't it? You will no longer be able to say for certain that the rendered config matches the node state?

"I know I'm adding changes not supported by the MCD to my Ignition, but I'm ok with the MCD ignoring them"

And are we confident that our users would understand what is an isn't supported by MCO? Where would the unsupported changes be documented?

So, for example, if the user sets the Passwd, that is supported by the MCD, the he has the warranty that if the update succeeds the Passwd will be updated in all the nodes of the pool

I used Passwd in my example as I thought I had seen written somewhere that the Passwd was something MCD couldn't update over time. So, my question stands then as, if a user makes a change to the partition scheme of an MCP, and they flag to allow MCD to make changes that aren't reconcilable, how as an end user, would I know which of my Nodes have the new partition scheme, and which have the old scheme?

This sounds like a valuable thing for the system to be able to assert. As far as I can tell, the current proposal drops this assertion doesn't it? You will no longer be able to say for certain that the rendered config matches the node state?

Yes and no, this proposal just allows the user to skip the last part of our validations (the ones that assert that fields non-managed by the MCO/MCD are not touched since the original Ignition config), so the user will have for granted that the nodes match the MCs state for everything that we manage. For the rest of the fields of Ignition? Correct, already existing nodes won't never match the state for those fields, and will never match, as we cannot implement the whole Ignition spec in the MCD right now.

And are we confident that our users would understand what is an isn't supported by MCO? Where would the unsupported changes be documented?

Well, I think is easier and faster to point users to what's supported, like we do in our docs. The fields listed in that section is all we support to be updated by the MCD as day-2 ops, the rest is considered unsupported.
Some of the non-supported fields have a dedicated check in validation, but others do not, like LUKS. A user now can add a MC that contains LUKS changes and MCO won't complain, the MCD will just apply what it knows and the update will succeed.

I used Passwd in my example as I thought I had seen written somewhere that the Passwd was something MCD couldn't update over time. So, my question stands then as, if a user makes a change to the partition scheme of an MCP, and they flag to allow MCD to make changes that aren't reconcilable, how as an end user, would I know which of my Nodes have the new partition scheme, and which have the old scheme?

Based on the original proposal (the current state of this PR) there's no way other than assuming that nodes that joined the cluster before the change won't have it, but, I must admit it's not desiderable and what @everettraven and you suggest, that is adding some status/state report of this makes total sense. I think the best thing we can do is to use the new MCN CRs and add there a dedicated status field that tells the user if the latest config was skipped or not and the amount of configs the node has skipped from day zero. If we want a finer grade report the only thing that comes to my mind is to add another field pointing to a list of the MCs we skipped.

. I think the best thing we can do is to use the new MCN CRs and add there a dedicated status field that tells the user if the latest config was skipped or not and the amount of configs the node has skipped from day zero. If we want a finer grade report the only thing that comes to my mind is to add another field pointing to a list of the MCs we skipped.

What would it take to be able to report an exact set of changes that were not applied? Has any spike evaluated the complexity of that?

I would imagine that you can diff the current and desired to identify the diff of changes, and then determine from those which are and aren't going to be applied (as you say, probably easier to check those that are supported than not), and then potentially, could that diff be reported on status for the remaining life of the MCN?

openshift-ci-robot · 2025-05-06T14:50:07Z

@pablintino: This pull request references MCO-1676 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This enhancement adds the feature to allow skipping non-reconcilable-changes validation in the MCO.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

JoelSpeed · 2025-05-09T07:00:05Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+OCP OS configuration is driven by Ignition, that performs a one-shot 
+configuration of the OS based on the Ignition spec of the MCO Pool each node
+belong to. Once the user configure install-time parameters, the MCO will 


Please define Install-time parameters somewhere (a glossary?) and provide a list of examples of such configs

I'll add an explanation of what install-time parameters means with some examples. Keep in mind that install-time parameters just means all supported Ignition fields, so I'll try to pick useful ones without mentioning all of them.

I think the key part of that explanation is helping readers understand what kind of changes we expect our users to make on a frequent(?) basis that would trigger the requirement of this feature, but yes, that sounds like you'll cover that

JoelSpeed · 2025-05-09T07:01:46Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+For these users, their only real option today would be to re-provision their 
+cluster with new install-time configuration, which is costly and time 
+consuming. We would like to introduce the ability for users in these scenarios 
+to instruct the MCO to allow for unreconcilable MachineConfig changes to be 


Nit: You have both non-reconcilable and unreconcilable and I believe I saw irreconcilable as well on the API. We should choose one term and be consistent

JoelSpeed · 2025-05-09T07:05:29Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+belong to. Once the user configure install-time parameters, the MCO will 
+prevent any further changes to non-reconcilable fields. While this is 
+generally useful for safety, it becomes problematic for any users who wishes 
+to change install-time only parameters, such as disk partition schema. 


Just trying to understand the flow here:

The disk partition schema today is part of the MachineConfiguration that gets rendered by MCC

When a node boots, it gets this first ignition, and configures its disks

If a user later changes the disk layout, MCO decides, I can't achieve that on existing nodes, and ... What does it do? How does a user see what happens next?

Given the above, and the fact that MCO boots every node as either control-plane or worker (we don't allow booting directly into other MCPs), it feels like this is a severe limitation right now no? I have to pick one set of install time parameters for all nodes that are worker (or infra, or any other class I could create an MCP for) and have that consistent for the lifecycle of the cluster?

^ This impact may be worth noting as part of the "why is this a problem section"

JoelSpeed · 2025-05-09T07:06:18Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+applied, skipping existing nodes, and serve this configuration to new nodes 
+joining the cluster. Invalid ignition is not considered in this case.
+
+### User Stories


Beyond disk partitions, what breaks this currently today? Your use cases are all about disk configuration

JoelSpeed · 2025-05-09T07:06:57Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+### Goals
+
+* Allow users to provide MCs with non-reconcilable fields for the use cases 


An exhaustive list of the scenarios considered irreconcilable today would be useful

JoelSpeed · 2025-05-09T07:28:09Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+### Test Plan
+
+MCO e2e tests and unit tests will cover this functionality.


A test plan should be more detailed than this. What specific parts of the functionality will you test, and how?

JoelSpeed · 2025-05-09T07:28:17Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+### Graduation Criteria
+
+This feature is behind the tech-preview FeatureGate in 4.20. 


JoelSpeed · 2025-05-09T07:28:37Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+## Tech Preview -> GA
+
+Bugs found by e2e tests and QE are .


Sentence ends abruptly

JoelSpeed · 2025-05-09T07:29:46Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+### Operational Aspects of API Extensions
+
+#### Failure Modes


What would happen if I had set the irreconcilable configuration to Relaxed, completed a round of updates, and later turned it back to Strict?

JoelSpeed · 2025-05-09T07:40:24Z

enhancements/machine-config/machine-config-non-reconcilable-changes.md

+
+## Alternatives (Not Implemented)
+
+Not applicable.


We do have custom pools, but that's not the main concern which we are trying to address here.

Reading through this EP, as someone who is fairly familiar with MCO, this is where my mind went. Why aren't we solving being able to boot into custom MCPs?

If a user could boot directly into a custom MCP, then they would be able to take everything from worker, create worker-new, and then migrate, through creation of new machines, their entire cluster over to the new configuration.

It would avoid having to have exceptions, it would avoid the "we say we are on currentConfig A, but actually that's not true" awkwardness of this current design

I believe it would probably also solve some other RFEs, I'm sure the question of "can be boot directly into a non-worker MCP" has come up several times in my tenure at RH

I would expect if this has been discussed, and dismissed as a viable alternative, then that would be detailed in this section to explain why this is either not a valid solution, or, why the proposed solution is better.

This is not the only way to achieve this, but a straightforward and relatively safe way to do so in the context of the MCO. Happy to discuss alternatives.

It would be helpful to detail the other ways you're aware of here, so that we can understand why the proposed solution is the best solution

A lot of the configurations that fall under this bucket (in practice) are things we generally want to do once during bootstrap, and not touch it again

While I realise that this is something customers can already achieve, so dropping support isn't possible. Would it have been better/would it be better in the future if we steered users, for the irreconcilable stuff, to drive that config through stub ignitions that they pass to MAPI (or equivalent) rather than driving them through the MCO?

If I set kernel arguments in the user-data-secret in MAPI today, would the MCO's served ignition overwrite that configuration, or would my customisations take precedence?

I have a rough plan for "node specific configuration" to be generated into a stub ignition in the future, so potentially these kinds of boot time configs could go into that, and that could become the preferred way to set these in the future?

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 30, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 30, 2025

pablintino mentioned this pull request Apr 30, 2025

MCO-1002: Add a flag to allow irreconcilable configs openshift/api#2244

Open

yuqi-zhang reviewed Apr 30, 2025

View reviewed changes

pablintino force-pushed the mco-1002 branch from 46b41fe to 6fc302b Compare May 1, 2025 11:25

everettraven reviewed May 1, 2025

View reviewed changes

enhancements/machine-config/machine-config-non-reconcilable-changes.md Outdated Show resolved Hide resolved

everettraven reviewed May 1, 2025

View reviewed changes

Add MCO-1002 enhancement

1ac85db

pablintino force-pushed the mco-1002 branch from 6fc302b to 1ac85db Compare May 2, 2025 11:58

everettraven reviewed May 2, 2025

View reviewed changes

pablintino changed the title ~~MCO-1002: MachineConfig Non-Reconcilable Changes~~ MCO-1676: MachineConfig Non-Reconcilable Changes May 6, 2025

JoelSpeed reviewed May 9, 2025

View reviewed changes

		* Add a knob in MCO's MachineConfiguration CR to skip non-reconcilable fields
		validation if necessary.

		After all the changes are applied, the MCD updates the Node annotations to
		point `machineconfiguration.openshift.io/currentConfig` to the updated one and


		### Goals

		* Allow users to provide MCs with non-reconcilable fields for the use cases


		### Test Plan

		MCO e2e tests and unit tests will cover this functionality.


		### Graduation Criteria

		This feature is behind the tech-preview FeatureGate in 4.20.

MCO-1676: MachineConfig Non-Reconcilable Changes #1785

Are you sure you want to change the base?

MCO-1676: MachineConfig Non-Reconcilable Changes #1785

Conversation

pablintino commented Apr 30, 2025

openshift-ci-robot commented Apr 30, 2025 • edited by openshift-ci bot Loading

openshift-ci bot commented Apr 30, 2025

openshift-ci bot commented Apr 30, 2025

yuqi-zhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablintino May 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablintino May 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablintino May 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented May 6, 2025 • edited by openshift-ci bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Apr 30, 2025 •

edited by openshift-ci bot

Loading

pablintino May 1, 2025 •

edited

Loading

pablintino May 2, 2025 •

edited

Loading

pablintino May 12, 2025 •

edited

Loading

openshift-ci-robot commented May 6, 2025 •

edited by openshift-ci bot

Loading