|
| 1 | +--- |
| 2 | +title: machine-config-non-reconcilable-changes |
| 3 | +authors: |
| 4 | + - "@pablintino" |
| 5 | +reviewers: |
| 6 | + - "@yuqi-zhang" |
| 7 | +approvers: |
| 8 | + - "@yuqi-zhang" |
| 9 | +api-approvers: |
| 10 | + - "@JoelSpeed" |
| 11 | +creation-date: 2025-04-23 |
| 12 | +tracking-link: |
| 13 | + - https://issues.redhat.com/browse/MCO-1002 |
| 14 | +see-also: |
| 15 | +replaces: |
| 16 | +superseded-by: |
| 17 | +--- |
| 18 | + |
| 19 | +# MachineConfig Non-Reconcilable Changes |
| 20 | + |
| 21 | +## Summary |
| 22 | + |
| 23 | +This enhancement describes the context around the MCO validation of MC CRs and |
| 24 | +why it will need to be skipped under certain, specific, circumstances. There |
| 25 | +are known customer use-cases that requires making non-reconcilable MC changes. |
| 26 | + |
| 27 | +## Motivation |
| 28 | + |
| 29 | +The MCO performs an internal validation of the rendered MC applied to each |
| 30 | +pool before applying it to the nodes. The MCO validation process can be split |
| 31 | +into three phases: |
| 32 | + |
| 33 | +1. Parse the Ignition raw configuration. |
| 34 | +2. Ensure the Ignition configuration is valid. |
| 35 | +3. Ensure there are no changes to non-reconcilable fields. |
| 36 | + |
| 37 | +The first two steps are self-explanatory and are not covered by this |
| 38 | +enhancement, as the MCO will always perform them. The third one, the |
| 39 | +validation of non-reconcilable fields, is the main target of this enhancement. |
| 40 | + |
| 41 | +The rendered MCs are fetched by new nodes when they join, and at that time, |
| 42 | +Ignition itself run following the instructions of the rendered MC Ignition. |
| 43 | +After the first boot the MCO is in charge of applying supported changes to |
| 44 | +nodes, and even if its configuration adheres to the Ignition schema, not all of |
| 45 | +the fields are supported after the first boot. The non-reconcilable |
| 46 | +validation mechanism alerts the user with a detailed message of the changes in |
| 47 | +MCs that are not supported. |
| 48 | +As stated, the main limitation behind the configuration validation is that the |
| 49 | +MCO does not support the complete Ignition schema thus, the only way for a |
| 50 | +user to make changes that the MCO does not support is to recreate the cluster |
| 51 | +from scratch. |
| 52 | +To avoid trashing and spinning the cluster up from scratch this enhancement |
| 53 | +proposes a flag in the MCO MachineConfiguration CR to tell the MCO to skip the |
| 54 | +validations and let the MCS serve the Ignition configuration, that can be used |
| 55 | +by new nodes. Not supported fields are harmless for the already existing nodes |
| 56 | +, that given the new configuration, will only apply changes for the fields the |
| 57 | +MCD supports. |
| 58 | + |
| 59 | +### User Stories |
| 60 | + |
| 61 | +* As a cluster admin, I am adding new nodes to a long-standing cluster and I |
| 62 | +would like change the partitions for the new nodes. |
| 63 | +* As a cluster admin, I am adding new nodes to a long-standing cluster and the |
| 64 | +new hardware requires a different set of kernel arguments that I would like |
| 65 | +to introduce. |
| 66 | + |
| 67 | +### Goals |
| 68 | + |
| 69 | +* Add a knob in MCO's MachineConfiguration CR to skip non-reconcilable fields |
| 70 | +validation if necessary. |
| 71 | + |
| 72 | +### Non-Goals |
| 73 | + |
| 74 | +* Allow invalid Ignition/MachineConfig fields to be applied. |
| 75 | +* Disable non-reconcilable MCs validation by default. |
| 76 | + |
| 77 | +## Proposal |
| 78 | + |
| 79 | +Update the MachineConfiguration CR by adding a new field to the spec that |
| 80 | +allows users to bypass validation for irreconcileable MachineConfig changes. |
| 81 | +The field will default to the current behaviour that is to |
| 82 | +validate all rendered MCs. |
| 83 | + |
| 84 | +The MachineConfig Controller and the MachingConfig MachineConfigDaemons will |
| 85 | +read in runtime the new field and if the value explicitely states that the |
| 86 | +validation should be skipped they will let the MC pass and get applied to |
| 87 | +the nodes. |
| 88 | + |
| 89 | +MachineConfig daemons will continue to perform the already supported updates |
| 90 | +to nodes, no matter if the non-reconcilable validation is skipped or not. |
| 91 | +Already existing nodes that receives only non-supported changes will skip the |
| 92 | +update and will be considered updated. |
| 93 | + |
| 94 | +### Workflow Description |
| 95 | + |
| 96 | +When a MCO user modifies the cluster-wide MachineConfigs a new rendered |
| 97 | +MachineConfig CR is created for each pool that has an association with the |
| 98 | +created, modified or deleted MachineConfig. The rendered MachineConfig, before |
| 99 | +being applied, is validated against the Ignition Schema. |
| 100 | + |
| 101 | +After the Ignition validation is done, the non-reconcilable fields validation |
| 102 | +is performed or skipped based on the proposed |
| 103 | +`machineConfigurationValidationPolicy` field in the MachineConfiguration CR. |
| 104 | +If the field is set to `Relaxed` the non-reconcilable fields validation is |
| 105 | +skipped performed, otherwise is done. |
| 106 | + |
| 107 | +The non-reconcilable MC validation remains as it is with this enhancement, as |
| 108 | +the [implementation](https://github.com/openshift/machine-config-operator/blob/e44d380686aee42f784a277236dbac49b083441e/pkg/controller/common/reconcile.go#L69) |
| 109 | +does not change with this enhancement. |
| 110 | + |
| 111 | +### API Extensions |
| 112 | + |
| 113 | +- Update the MachineConfiguration CRD to add an enumeration field, called |
| 114 | +`machineConfigurationValidationPolicy` that is used as the validation |
| 115 | +skipping toggle. The field does not set a default values to let the MCO pick |
| 116 | +what to do in the default case. The enumeration has only two values: |
| 117 | + - Strict: Validation is always performed. This is the value the MCO will |
| 118 | + use as default. |
| 119 | + - Relaxed: The validation of non reconcilable fields is skipped and only |
| 120 | + the Ignition syntactic validation will be done. |
| 121 | + |
| 122 | +### Risks and Mitigations |
| 123 | + |
| 124 | +By setting `machineConfigurationValidationPolicy` to `Relaxed` the customer |
| 125 | +acknowledges that providing MCs that make use of Ignition features out of the |
| 126 | +scope of the MCO will lead to cluster with nodes using different Ignition |
| 127 | +configurations. |
| 128 | + |
| 129 | +### Drawbacks |
| 130 | + |
| 131 | +None. |
| 132 | + |
| 133 | +## Design Details |
| 134 | + |
| 135 | +### Open Questions [optional] |
| 136 | + |
| 137 | +None. |
| 138 | + |
| 139 | +### Test Plan |
| 140 | + |
| 141 | +MCO e2e tests and unit tests will cover this functionality. |
| 142 | + |
| 143 | +### Graduation Criteria |
| 144 | + |
| 145 | +This feature is behind the tech-preview FeatureGate in 4.20. |
| 146 | +Once it is tested by QE and users it can be GA'd since it should not impact |
| 147 | +daily usage of a cluster. |
| 148 | + |
| 149 | +## Dev Preview -> Tech Preview |
| 150 | + |
| 151 | +Not applicable. Feature introduced in Tech Preview. |
| 152 | + |
| 153 | +## Tech Preview -> GA |
| 154 | + |
| 155 | +Bugs found by e2e tests and QE are . |
| 156 | + |
| 157 | +#### Removing a deprecated feature |
| 158 | + |
| 159 | +### Upgrade / Downgrade Strategy |
| 160 | + |
| 161 | +Upgrades or downgrades are not impacted by the presence or not of this feature. |
| 162 | + |
| 163 | +### Version Skew Strategy |
| 164 | + |
| 165 | +Not applicable. |
| 166 | + |
| 167 | +### Operational Aspects of API Extensions |
| 168 | + |
| 169 | +#### Failure Modes |
| 170 | + |
| 171 | +If the non-reconcilable configuration validation is performed and it fails |
| 172 | +the MCO continues to report the failure as it is alraedy doing in the MCP, by |
| 173 | +setting to the MCP the `RenderDegraded` condition to true. |
| 174 | + |
| 175 | +If the configuration reaches the MCD and the non-reconcilable validation |
| 176 | +fails the MCN `UpdatePrepared` condition is updated with the details of the |
| 177 | +validation failure. |
| 178 | + |
| 179 | +#### Support Procedures |
| 180 | + |
| 181 | +None. |
| 182 | + |
| 183 | +## Implementation History |
| 184 | + |
| 185 | +Not applicable. |
| 186 | + |
| 187 | +## Alternatives (Not Implemented) |
| 188 | + |
| 189 | +Not applicable. |
0 commit comments