Skip to content

MCO-1676: MachineConfig Non-Reconcilable Changes #1785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pablintino
Copy link

This enhancement adds the feature to allow skipping non-reconcilable-changes validation in the MCO.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 30, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 30, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 30, 2025

@pablintino: This pull request references MCO-1002 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

This enhancement adds the feature to allow skipping non-reconcilable-changes validation in the MCO.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Apr 30, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Apr 30, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign coverprice for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally makes sense to me! Some comments inline

why it will need to be skip under certain, specific, circumstances. There are
known customer use-cases that requires making non-reconcilable MC changes.

## Motivation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the motivation section, I think we can talk a bit more about the limitations today in terms of use cases. Maybe a snippet after the non-reconcileable changes like:

Once you configure install-time parameters, the MCO will prevent any further changes to non-reconcileable fields. While this is generally useful for safety, it becomes problematic for any users who wishes to change install-time only parameters, such as disk partition schema. In the worst case, this would prevent scaleup of new nodes with any differences incompatible with existing MachineConfigs.

For these users, their only real option today would be to reprovision their cluster with new install-time configuration, which is costly and time consuming. We would like to introduce the ability for users in these scenarios to instruct the MCO to allow for unreconcileable MachineConfig changes to be applied, skipping existing nodes, and serve this configuration to new nodes joining the cluster. Invalid ignition is not considered in this case.

Just a suggestion, feel free to adapt this. This way we can shorten the motivation section a bit and be a bit more clear that the issue isn't necessarily allowing non-supported changes, but to work around them with user acknowledgement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good suggestion! I have used it directly, cause, to be honest, I do not think I have a better way to express it. To land readers (influenced by other comments in the PR), I added a small introduction sentence to state that Ignition is the one doind the initial one-off config.


* As a cluster admin, I am adding new nodes to a long-standing cluster and I
would like change the partitions for the new nodes.
* As a cluster admin, I am adding new nodes to a long-standing cluster and the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernel argument changes are generally allowed. Perhaps we can use another motivation here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Changed!

## Proposal

Update the MachineConfiguration CR to add a new field in the spec to give
users the ability to disable MCs validation. The field will default to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of disabling MC validation, maybe we should phrase it as "allow irreconcileable MachineConfig changes to bypass validation"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I've rewritten the whole sentence to make it a bit shorter are readable including your changes.


The MachineConfig Controller and the MachingConfig MachineConfigDaemons will
read in runtime the new field and if the value explicitely states that the
validation should be skipped they will let the MC pass and get applied to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify that the MCD would not actually perform a node update for any irreconcileable fields that were changed, but instead just perform regularly supported fields, and only update the current node annotations and report the node as updated if all the changes were irreconcileable originally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

validation should be skipped they will let the MC pass and get applied to
the nodes.

### Workflow Description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk about how this config would apply to existing nodes. I think we'd still update their annotations, but have the daemon recognize it's a no-op update. WDYT?

Copy link
Author

@pablintino pablintino May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one that I didn't think about.
If I did my research job properly the ones we will look/update are the ones this code updates, like machineconfiguration.openshift.io/state, machineconfiguration.openshift.io/currentConfig and machineconfiguration.openshift.io/desiredConfig isn't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated explicetly state what happens with the state and the current config annotations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some more questions to try and get a bit more clarity here:

  • Would updating annotations for existing nodes potentially send a false-positive signal to users that a node has been updated to reflect their desired changes?
  • If you are able to have the daemon recognize no-op updates, why do you need to have these validations in the first place?
    • How are failures of these validations communicated to users today?
    • Could we leverage an existing communication layer to inform users that the changes they are making will only be applied to new nodes and ignored for existing nodes?

Copy link
Author

@pablintino pablintino May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first point may be related and answered by answer of Wouldn't this be misleading to users if we update this annotation on existing nodes that aren't actually configured using the "updated" config?.
For the second point: The validations are done before the MCD enters in the scene, in our MCO controller, and of if the MCs passes the validation in the controller, the node "desired" annotation is updated and the MCD starts the update process.
About the reporting of the errors: We report the irreconcilable error (first field to violate the rules) in the MCP CR by adding the RenderDegraded condition to the status. The condition message has the details of which field is failing making the validation fail.
About the last point: I have the feeling that giving that information will add more confusion, as it may make the user think that new configs cannot target "old" nodes. What may help is a new condition in the MCN or annotation (I do not know which one will be better) that indicates if the update was a no-op or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second point: The validations are done before the MCD enters in the scene, in our MCO controller, and of if the MCs passes the validation in the controller, the node "desired" annotation is updated and the MCD starts the update process.

What would happen if the MCO just didn't do these validations? Why were these validations important in the first place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a longer answer to try to provide background for this (and maybe some other) questions:
When we initially merged Atomic Host and Container Linux to create RHCOS for OCP4, we decided to use ignition (the spec) as the primary way a user defines their system configuration, encapsulated as part of the "MachineConfig" object (which really just started as ignition + an osImage field).

The original design for ignition (as a binary) was that it was to run as part of the initramfs on the very first boot of the CL system, and as such had a lot of freedom to configure whatever it wanted. The spec mostly was designed to support those operations. When we moved that over to OCP, we still ran ignition in the same way in RHCOS, but also added ignition to the MachineConfig spec, and we had a subset of "post-install" fields that we believe we could manage to keep the cluster updated. The MCO would only allow those, which morphed into the validations you see today. (side note: if we were to do this today, we'd probably have this all with API validation, but that wasn't done initially during 4.0 so those legacy fields stayed relatively loose in terms of what you can and can't apply)

This worked fine for the most part, since changing any field that doesn't pass those validations was not very common during the early days. But as OCP 4 grew, and users had older and older clusters, many wanted to introduce new features, such as new hardware, new encryption, etc. We could eventually update the MCO to support some of these operations, but a few critical paths (such as disk partitions) we don't want to do to live systems, and so there became more of a need to allow new nodes to join with different configuration than existing ones. (side note 2: we do technically support doing this, via ignition directly, which is embedded into secrets in the MAPI namespace. However that becomes unmanaged state so we prefer to have users do it through the MCO directly)

So those validations are still important: we don't want the user to change fields we can't support, but we also want a way for the users to acknowledge that they need firstboot config served by the MCServer that differs from the configs on existing nodes. This current proposal is the method the MCO team decided was the most straightforward way to get them going, but if we believe necessary, we can create some alternatives, but those may require API or architecture changes to how we manage configuration today.

Comment on lines 69 to 70
* Add a knob in MCO's MachineConfiguration CR to skip non-reconcilable fields
validation if necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds a bit more like the goal here is to allow adding new nodes with a different machine configuration than existing nodes. Is this correct?

The current goal here sounds more prescriptive of a how as opposed to a what.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I have reworked it.

Comment on lines +124 to +135
By setting `machineConfigurationValidationPolicy` to `Relaxed` the customer
acknowledges that providing MCs that make use of Ignition features out of the
scope of the MCO will lead to cluster with nodes using different Ignition
configurations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to a customer's cluster in this scenario? Can this have negative consequences? How severe are they?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, the customer could apply configuration that is valid but not directly supported by the MCO, and think that the MCO applied it to existing nodes since we do not error. In practice, I've not seen anyone do that, and the flag (which will always be non-relaxed by default) is a way for them to acknowledge that their valid, but unsupported-for-existing-node configuration can be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way that users could be signaled as to what nodes have had the configuration applied to them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven Nowadays not, but, given that we now have MCNs we can use that to signal users "hey, the configuration didn't change anything" as I state here

- Relaxed: The validation of non reconcilable fields is skipped and only
the Ignition syntactic validation will be done.

### Risks and Mitigations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is called out as a non-goal for allowing invalid Ignition/MachineConfig instances, how do you plan to prevent this? Is it possible for this proposed field to be leveraged in a what that could cause this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCO always validates all ignition and machineconfigs, and we do not plan to ever support invalid configuration. Invalid in this case means a mistake in the provided spec.


## Alternatives (Not Implemented)

Not applicable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty unfamiliar with the MCO, but is there a way to specify that a machine configuration should only apply to a subset of nodes? Any new nodes that match the subset are configured using that machine configuration.

Why is ignoring a specific part of validation the only way to achieve this?

Are users wanting to be able to configure nodes in a way that MCO just doesn't support? Why can't/shouldn't MCO be updated to support these configurations that our customers are asking for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but is there a way to specify that a machine configuration should only apply to a subset of nodes? Any new nodes that match the subset are configured using that machine configuration.

We do have custom pools, but that's not the main concern which we are trying to address here.

Why is ignoring a specific part of validation the only way to achieve this?

This is not the only way to achieve this, but a straightforward and relatively safe way to do so in the context of the MCO. Happy to discuss alternatives.

Are users wanting to be able to configure nodes in a way that MCO just doesn't support? Why can't/shouldn't MCO be updated to support these configurations that our customers are asking for?

Good question. A lot of the configurations that fall under this bucket (in practice) are things we generally want to do once during bootstrap, and not touch it again. While the MCO could potentially implement e.g. repartitioning of disks on existing, running nodes, it's not something we're planning to support, and these configurations will remain bootstrap-time only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have custom pools, but that's not the main concern which we are trying to address here.

Reading through this EP, as someone who is fairly familiar with MCO, this is where my mind went. Why aren't we solving being able to boot into custom MCPs?

If a user could boot directly into a custom MCP, then they would be able to take everything from worker, create worker-new, and then migrate, through creation of new machines, their entire cluster over to the new configuration.

It would avoid having to have exceptions, it would avoid the "we say we are on currentConfig A, but actually that's not true" awkwardness of this current design

I believe it would probably also solve some other RFEs, I'm sure the question of "can be boot directly into a non-worker MCP" has come up several times in my tenure at RH

I would expect if this has been discussed, and dismissed as a viable alternative, then that would be detailed in this section to explain why this is either not a valid solution, or, why the proposed solution is better.

This is not the only way to achieve this, but a straightforward and relatively safe way to do so in the context of the MCO. Happy to discuss alternatives.

It would be helpful to detail the other ways you're aware of here, so that we can understand why the proposed solution is the best solution

A lot of the configurations that fall under this bucket (in practice) are things we generally want to do once during bootstrap, and not touch it again

While I realise that this is something customers can already achieve, so dropping support isn't possible. Would it have been better/would it be better in the future if we steered users, for the irreconcilable stuff, to drive that config through stub ignitions that they pass to MAPI (or equivalent) rather than driving them through the MCO?

If I set kernel arguments in the user-data-secret in MAPI today, would the MCO's served ignition overwrite that configuration, or would my customisations take precedence?

I have a rough plan for "node specific configuration" to be generated into a stub ignition in the future, so potentially these kinds of boot time configs could go into that, and that could become the preferred way to set these in the future?

Comment on lines +115 to +116
After all the changes are applied, the MCD updates the Node annotations to
point `machineconfiguration.openshift.io/currentConfig` to the updated one and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be misleading to users if we update this annotation on existing nodes that aren't actually configured using the "updated" config?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the configuration, from the MCO point of view is applied. Every field supported by the MCO will be updated in "old" nodes too, the key is that configuration that Ignition uses won't never be updated and it's not expected to be supported. For changes that only contains non-supported fields (aka, the ones that Ignition uses at installation time) the MCD will not see anything to update and will just set the current config and the state.
Keep in mind that is possible for a user to define a config with a mix of both scenarios and the behavior will be the same, the MCD will just update what it knows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I care about here is the point of view of the users. I'm not sure it really matters if the MCO views the configuration as having been applied if users don't share that same view point. Is the semantic here well documented such that users won't be confused in this situation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven So, keep in mind this is just a backdoor customers to skip the validation under very specific circumstances. Given that, and given we will need to document this feature as a "use it with care, these are the implications and the use cases" One thing we can do make everything a bit more verbose is to add a few fields in our MCN CRs statuses to let know the user how many configurations have we skipped and if the current applied one changed something/was skipped.
I must agree this feature needs proper, good written, docs that tells users what to expect and how exceptional the usage of this feature should be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are some of those implications? Does that include a user's cluster being marked as entering an unsupported state? How would a user, once entering that unsupported state, be able to roll back to a supported state?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could you report the differences between the applied configuration and the currentConfig so that a user would be able to check a specific node to see if there are any legacy configurations that have not been updated?

The user story I'm thinking about here is, if I make a change, I want to make sure every node gets that change, it's important for example that every disk now has a new Passwd.

If MCO updates the currentConfig for every node, how do I now identify the nodes which have the old value for Passwd so that I can go and recreate them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I did not pick the proper wording and did not explain how what we have now works and the small change this features introduces, but let me try to explain it based on both answers.
First, we do not manage and will not manage what we had/have applied to a node in such atomic way. We only know (and will know) if a whole rendered MC (the result of merging all the MCs for a given pool) was applied or not, and only one, you cannot "enqueue" many changes. This means:

  • All nodes have a current configuration ID, pointing tho the latest successful rendered MC that was applied
  • If a pool is rolling out an update, all nodes under that pool umbrella will point their desired MC to the rendered MC of the update. The current ID won't change till the update succeeds. If the update fails, ie, by a validation, the desired ID won't be updated to match the current ID.
  • If a pool reaches the degraded state, because of i.e. an invalid config, the next change to the MCs of the pool won't add an "update on top of the previous one", the new rendered will replace the desired one and the MCDs will try to apply that one instead, leaving the previous infructuous one stale.

So, to your points @JoelSpeed. The user has the warranty that all the fields the MCO (and the MCD in particular) manages will always be updated, no exception. This proposed flag allows users to tell the MCO "I know I'm adding changes not supported by the MCD to my Ignition, but I'm ok with the MCD ignoring them". So, for example, if the user sets the Passwd, that is supported by the MCD, the he has the warranty that if the update succeeds the Passwd will be updated in all the nodes of the pool, but, if he, for example, added a partition, nothing will happen. In case the user mixes "supported" and "unsupported" changes, all the supported ones will be applied as usual, but the "unsupported" ones will be a no-op for the current nodes and only new nodes joining the cluster will see them, as their Ignition boot will take and apply them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only know (and will know) if a whole rendered MC (the result of merging all the MCs for a given pool) was applied or not, and only one, you cannot "enqueue" many changes.

This sounds like a valuable thing for the system to be able to assert. As far as I can tell, the current proposal drops this assertion doesn't it? You will no longer be able to say for certain that the rendered config matches the node state?

"I know I'm adding changes not supported by the MCD to my Ignition, but I'm ok with the MCD ignoring them"

And are we confident that our users would understand what is an isn't supported by MCO? Where would the unsupported changes be documented?

So, for example, if the user sets the Passwd, that is supported by the MCD, the he has the warranty that if the update succeeds the Passwd will be updated in all the nodes of the pool

I used Passwd in my example as I thought I had seen written somewhere that the Passwd was something MCD couldn't update over time. So, my question stands then as, if a user makes a change to the partition scheme of an MCP, and they flag to allow MCD to make changes that aren't reconcilable, how as an end user, would I know which of my Nodes have the new partition scheme, and which have the old scheme?

Copy link
Author

@pablintino pablintino May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a valuable thing for the system to be able to assert. As far as I can tell, the current proposal drops this assertion doesn't it? You will no longer be able to say for certain that the rendered config matches the node state?

Yes and no, this proposal just allows the user to skip the last part of our validations (the ones that assert that fields non-managed by the MCO/MCD are not touched since the original Ignition config), so the user will have for granted that the nodes match the MCs state for everything that we manage. For the rest of the fields of Ignition? Correct, already existing nodes won't never match the state for those fields, and will never match, as we cannot implement the whole Ignition spec in the MCD right now.

And are we confident that our users would understand what is an isn't supported by MCO? Where would the unsupported changes be documented?

Well, I think is easier and faster to point users to what's supported, like we do in our docs. The fields listed in that section is all we support to be updated by the MCD as day-2 ops, the rest is considered unsupported.
Some of the non-supported fields have a dedicated check in validation, but others do not, like LUKS. A user now can add a MC that contains LUKS changes and MCO won't complain, the MCD will just apply what it knows and the update will succeed.

I used Passwd in my example as I thought I had seen written somewhere that the Passwd was something MCD couldn't update over time. So, my question stands then as, if a user makes a change to the partition scheme of an MCP, and they flag to allow MCD to make changes that aren't reconcilable, how as an end user, would I know which of my Nodes have the new partition scheme, and which have the old scheme?

Based on the original proposal (the current state of this PR) there's no way other than assuming that nodes that joined the cluster before the change won't have it, but, I must admit it's not desiderable and what @everettraven and you suggest, that is adding some status/state report of this makes total sense. I think the best thing we can do is to use the new MCN CRs and add there a dedicated status field that tells the user if the latest config was skipped or not and the amount of configs the node has skipped from day zero. If we want a finer grade report the only thing that comes to my mind is to add another field pointing to a list of the MCs we skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. I think the best thing we can do is to use the new MCN CRs and add there a dedicated status field that tells the user if the latest config was skipped or not and the amount of configs the node has skipped from day zero. If we want a finer grade report the only thing that comes to my mind is to add another field pointing to a list of the MCs we skipped.

What would it take to be able to report an exact set of changes that were not applied? Has any spike evaluated the complexity of that?

I would imagine that you can diff the current and desired to identify the diff of changes, and then determine from those which are and aren't going to be applied (as you say, probably easier to check those that are supported than not), and then potentially, could that diff be reported on status for the remaining life of the MCN?

@pablintino pablintino changed the title MCO-1002: MachineConfig Non-Reconcilable Changes MCO-1676: MachineConfig Non-Reconcilable Changes May 6, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 6, 2025

@pablintino: This pull request references MCO-1676 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This enhancement adds the feature to allow skipping non-reconcilable-changes validation in the MCO.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.


OCP OS configuration is driven by Ignition, that performs a one-shot
configuration of the OS based on the Ignition spec of the MCO Pool each node
belong to. Once the user configure install-time parameters, the MCO will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define Install-time parameters somewhere (a glossary?) and provide a list of examples of such configs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an explanation of what install-time parameters means with some examples. Keep in mind that install-time parameters just means all supported Ignition fields, so I'll try to pick useful ones without mentioning all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key part of that explanation is helping readers understand what kind of changes we expect our users to make on a frequent(?) basis that would trigger the requirement of this feature, but yes, that sounds like you'll cover that

For these users, their only real option today would be to re-provision their
cluster with new install-time configuration, which is costly and time
consuming. We would like to introduce the ability for users in these scenarios
to instruct the MCO to allow for unreconcilable MachineConfig changes to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You have both non-reconcilable and unreconcilable and I believe I saw irreconcilable as well on the API. We should choose one term and be consistent

belong to. Once the user configure install-time parameters, the MCO will
prevent any further changes to non-reconcilable fields. While this is
generally useful for safety, it becomes problematic for any users who wishes
to change install-time only parameters, such as disk partition schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to understand the flow here:

  • The disk partition schema today is part of the MachineConfiguration that gets rendered by MCC
  • When a node boots, it gets this first ignition, and configures its disks
  • If a user later changes the disk layout, MCO decides, I can't achieve that on existing nodes, and ... What does it do? How does a user see what happens next?

Given the above, and the fact that MCO boots every node as either control-plane or worker (we don't allow booting directly into other MCPs), it feels like this is a severe limitation right now no? I have to pick one set of install time parameters for all nodes that are worker (or infra, or any other class I could create an MCP for) and have that consistent for the lifecycle of the cluster?

^ This impact may be worth noting as part of the "why is this a problem section"

applied, skipping existing nodes, and serve this configuration to new nodes
joining the cluster. Invalid ignition is not considered in this case.

### User Stories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond disk partitions, what breaks this currently today? Your use cases are all about disk configuration


### Goals

* Allow users to provide MCs with non-reconcilable fields for the use cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exhaustive list of the scenarios considered irreconcilable today would be useful


### Test Plan

MCO e2e tests and unit tests will cover this functionality.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test plan should be more detailed than this. What specific parts of the functionality will you test, and how?


### Graduation Criteria

This feature is behind the tech-preview FeatureGate in 4.20.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What gate?


## Tech Preview -> GA

Bugs found by e2e tests and QE are .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence ends abruptly


### Operational Aspects of API Extensions

#### Failure Modes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if I had set the irreconcilable configuration to Relaxed, completed a round of updates, and later turned it back to Strict?


## Alternatives (Not Implemented)

Not applicable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have custom pools, but that's not the main concern which we are trying to address here.

Reading through this EP, as someone who is fairly familiar with MCO, this is where my mind went. Why aren't we solving being able to boot into custom MCPs?

If a user could boot directly into a custom MCP, then they would be able to take everything from worker, create worker-new, and then migrate, through creation of new machines, their entire cluster over to the new configuration.

It would avoid having to have exceptions, it would avoid the "we say we are on currentConfig A, but actually that's not true" awkwardness of this current design

I believe it would probably also solve some other RFEs, I'm sure the question of "can be boot directly into a non-worker MCP" has come up several times in my tenure at RH

I would expect if this has been discussed, and dismissed as a viable alternative, then that would be detailed in this section to explain why this is either not a valid solution, or, why the proposed solution is better.

This is not the only way to achieve this, but a straightforward and relatively safe way to do so in the context of the MCO. Happy to discuss alternatives.

It would be helpful to detail the other ways you're aware of here, so that we can understand why the proposed solution is the best solution

A lot of the configurations that fall under this bucket (in practice) are things we generally want to do once during bootstrap, and not touch it again

While I realise that this is something customers can already achieve, so dropping support isn't possible. Would it have been better/would it be better in the future if we steered users, for the irreconcilable stuff, to drive that config through stub ignitions that they pass to MAPI (or equivalent) rather than driving them through the MCO?

If I set kernel arguments in the user-data-secret in MAPI today, would the MCO's served ignition overwrite that configuration, or would my customisations take precedence?

I have a rough plan for "node specific configuration" to be generated into a stub ignition in the future, so potentially these kinds of boot time configs could go into that, and that could become the preferred way to set these in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants