|
| 1 | +--- |
| 2 | +title: machine-config-non-reconcilable-changes |
| 3 | +authors: |
| 4 | + - "@pablintino" |
| 5 | +reviewers: |
| 6 | + - "@yuqi-zhang" |
| 7 | +approvers: |
| 8 | + - "@yuqi-zhang" |
| 9 | +api-approvers: |
| 10 | + - "@JoelSpeed" |
| 11 | +creation-date: 2025-04-23 |
| 12 | +tracking-link: |
| 13 | + - https://issues.redhat.com/browse/MCO-1002 |
| 14 | +see-also: |
| 15 | +replaces: |
| 16 | +superseded-by: |
| 17 | +--- |
| 18 | + |
| 19 | +# MachineConfig Non-Reconcilable Changes |
| 20 | + |
| 21 | +## Summary |
| 22 | + |
| 23 | +This enhancement describes the context around the MCO validation of |
| 24 | +MachineConfigs and why it will need to be partially skipped under certain, |
| 25 | +specific, circumstances. |
| 26 | + |
| 27 | +## Motivation |
| 28 | + |
| 29 | +OCP OS configuration is driven by Ignition, that performs a one-shot |
| 30 | +configuration of the OS based on the Ignition spec of the MCO Pool each node |
| 31 | +belong to. Once the user configure install-time parameters, the MCO will |
| 32 | +prevent any further changes to non-reconcilable fields. While this is |
| 33 | +generally useful for safety, it becomes problematic for any users who wishes |
| 34 | +to change install-time only parameters, such as disk partition schema. |
| 35 | +In the worst case, this would prevent scaleup of new nodes with any |
| 36 | +differences incompatible with existing MachineConfigs. |
| 37 | + |
| 38 | +For these users, their only real option today would be to re-provision their |
| 39 | +cluster with new install-time configuration, which is costly and time |
| 40 | +consuming. We would like to introduce the ability for users in these scenarios |
| 41 | +to instruct the MCO to allow for unreconcilable MachineConfig changes to be |
| 42 | +applied, skipping existing nodes, and serve this configuration to new nodes |
| 43 | +joining the cluster. Invalid ignition is not considered in this case. |
| 44 | + |
| 45 | +### User Stories |
| 46 | + |
| 47 | +* As a cluster admin, I am adding new nodes to a long-standing cluster and I |
| 48 | +would like change the partitions schema for the new nodes. |
| 49 | +* As a cluster admin, I am adding new nodes to a long-standing cluster and the |
| 50 | +new hardware has a different set of disks that requires different disks and |
| 51 | +filesystems sections. |
| 52 | + |
| 53 | +### Goals |
| 54 | + |
| 55 | +* Allow users to provide MCs with non-reconcilable fields for the use cases |
| 56 | +in which preserving the original install-time parameters values is not |
| 57 | +possible. |
| 58 | + |
| 59 | +### Non-Goals |
| 60 | + |
| 61 | +* Allow invalid Ignition/MachineConfig fields to be applied. |
| 62 | +* Disable non-reconcilable MCs validation by default. |
| 63 | + |
| 64 | +## Proposal |
| 65 | + |
| 66 | +Update the MachineConfiguration CR by adding a new field to the spec that |
| 67 | +allows users to bypass validation for irreconcilable MachineConfig changes. |
| 68 | +The field will default to the current behavior that is to validate all |
| 69 | +rendered MCs. |
| 70 | + |
| 71 | +The MachineConfig Controller and the MachingConfig MachineConfigDaemons will |
| 72 | +read in runtime the new field and if the value explicitly states that the |
| 73 | +validation should be skipped they will let the MachineConfig pass and get |
| 74 | +applied to the nodes. |
| 75 | + |
| 76 | +MachineConfig daemons will continue to perform the already supported updates |
| 77 | +to nodes, no matter if the non-reconcilable validation is skipped or not. |
| 78 | +Already existing nodes that receives only non-supported changes will skip the |
| 79 | +update and will be considered updated. |
| 80 | + |
| 81 | +### Workflow Description |
| 82 | + |
| 83 | +Each time a MachineConfig is changed a new rendered MachineConfig is created |
| 84 | +for each pool associated to the changed MachineConfig. |
| 85 | + |
| 86 | +Before the freshly created rendered MachineConfig passes to the |
| 87 | +MachineConfigDaemons the MCO performs an internal validation of it that can |
| 88 | +can be divided into three phases: |
| 89 | + |
| 90 | +1. Parse the Ignition raw configuration. |
| 91 | +2. Ensure the Ignition configuration is valid. |
| 92 | +3. Ensure there are no changes to non-reconcilable fields. |
| 93 | + |
| 94 | +The first two steps are self-explanatory and are not covered by this |
| 95 | +enhancement, as the MCO will always perform them. The third one, the |
| 96 | +validation of non-reconcilable fields, is the main target of this enhancement. |
| 97 | + |
| 98 | +After the Ignition validation is done, the non-reconcilable fields validation |
| 99 | +is performed or skipped based on the proposed |
| 100 | +`machineConfigurationValidationPolicy` field in the MachineConfiguration CR. |
| 101 | +If the field is set to `Relaxed` the non-reconcilable fields validation is |
| 102 | +skipped, otherwise is performed. |
| 103 | + |
| 104 | +The non-reconcilable MachineConfig validation remains as it is with this |
| 105 | +enhancement, as the [implementation](https://github.com/openshift/machine-config-operator/blob/e44d380686aee42f784a277236dbac49b083441e/pkg/controller/common/reconcile.go#L69) |
| 106 | +does not change with this enhancement. |
| 107 | + |
| 108 | +After the validation checks are done the nodes of the MCP are updated to point |
| 109 | +to the new rendered MachineConfig as the desired one and the MCD starts to |
| 110 | +apply the requested changes. |
| 111 | + |
| 112 | +The MCD only applies changes to the supported fields, any change to the |
| 113 | +MC out of supported ones is ignored. |
| 114 | + |
| 115 | +After all the changes are applied, the MCD updates the Node annotations to |
| 116 | +point `machineconfiguration.openshift.io/currentConfig` to the updated one and |
| 117 | +`machineconfiguration.openshift.io/state` to `Done` if the update succeeds. |
| 118 | + |
| 119 | +### API Extensions |
| 120 | + |
| 121 | +- Update the MachineConfiguration CRD to add an enumeration field, called |
| 122 | +`machineConfigurationValidationPolicy` that is used as the validation |
| 123 | +skipping toggle. The field does not set a default values to let the MCO pick |
| 124 | +what to do in the default case. The enumeration has only two values: |
| 125 | + - Strict: Validation is always performed. This is the value the MCO will |
| 126 | + use as default. |
| 127 | + - Relaxed: The validation of non reconcilable fields is skipped and only |
| 128 | + the Ignition syntactic validation will be done. |
| 129 | + |
| 130 | +### Risks and Mitigations |
| 131 | + |
| 132 | +By setting `machineConfigurationValidationPolicy` to `Relaxed` the customer |
| 133 | +acknowledges that providing MCs that make use of Ignition features out of the |
| 134 | +scope of the MCO will lead to cluster with nodes using different Ignition |
| 135 | +configurations. |
| 136 | + |
| 137 | +### Drawbacks |
| 138 | + |
| 139 | +None. |
| 140 | + |
| 141 | +## Design Details |
| 142 | + |
| 143 | +### Open Questions [optional] |
| 144 | + |
| 145 | +None. |
| 146 | + |
| 147 | +### Test Plan |
| 148 | + |
| 149 | +MCO e2e tests and unit tests will cover this functionality. |
| 150 | + |
| 151 | +### Graduation Criteria |
| 152 | + |
| 153 | +This feature is behind the tech-preview FeatureGate in 4.20. |
| 154 | +Once it is tested by QE and users it can be GA'd since it should not impact |
| 155 | +daily usage of a cluster. |
| 156 | + |
| 157 | +## Dev Preview -> Tech Preview |
| 158 | + |
| 159 | +Not applicable. Feature introduced in Tech Preview. |
| 160 | + |
| 161 | +## Tech Preview -> GA |
| 162 | + |
| 163 | +Bugs found by e2e tests and QE are . |
| 164 | + |
| 165 | +#### Removing a deprecated feature |
| 166 | + |
| 167 | +### Upgrade / Downgrade Strategy |
| 168 | + |
| 169 | +Upgrades or downgrades are not impacted by the presence or not of this feature. |
| 170 | + |
| 171 | +### Version Skew Strategy |
| 172 | + |
| 173 | +Not applicable. |
| 174 | + |
| 175 | +### Operational Aspects of API Extensions |
| 176 | + |
| 177 | +#### Failure Modes |
| 178 | + |
| 179 | +If the non-reconcilable configuration validation is performed and it fails |
| 180 | +the MCO continues to report the failure as it is alraedy doing in the MCP, by |
| 181 | +setting to the MCP the `RenderDegraded` condition to true. |
| 182 | + |
| 183 | +If the configuration reaches the MCD and the non-reconcilable validation |
| 184 | +fails the MCN `UpdatePrepared` condition is updated with the details of the |
| 185 | +validation failure. |
| 186 | + |
| 187 | +#### Support Procedures |
| 188 | + |
| 189 | +None. |
| 190 | + |
| 191 | +## Implementation History |
| 192 | + |
| 193 | +Not applicable. |
| 194 | + |
| 195 | +## Alternatives (Not Implemented) |
| 196 | + |
| 197 | +Not applicable. |
0 commit comments