Skip to content

Commit 1ac85db

Browse files
committed
Add MCO-1002 enhancement
1 parent 995b620 commit 1ac85db

File tree

1 file changed

+197
-0
lines changed

1 file changed

+197
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
---
2+
title: machine-config-non-reconcilable-changes
3+
authors:
4+
- "@pablintino"
5+
reviewers:
6+
- "@yuqi-zhang"
7+
approvers:
8+
- "@yuqi-zhang"
9+
api-approvers:
10+
- "@JoelSpeed"
11+
creation-date: 2025-04-23
12+
tracking-link:
13+
- https://issues.redhat.com/browse/MCO-1002
14+
see-also:
15+
replaces:
16+
superseded-by:
17+
---
18+
19+
# MachineConfig Non-Reconcilable Changes
20+
21+
## Summary
22+
23+
This enhancement describes the context around the MCO validation of
24+
MachineConfigs and why it will need to be partially skipped under certain,
25+
specific, circumstances.
26+
27+
## Motivation
28+
29+
OCP OS configuration is driven by Ignition, that performs a one-shot
30+
configuration of the OS based on the Ignition spec of the MCO Pool each node
31+
belong to. Once the user configure install-time parameters, the MCO will
32+
prevent any further changes to non-reconcilable fields. While this is
33+
generally useful for safety, it becomes problematic for any users who wishes
34+
to change install-time only parameters, such as disk partition schema.
35+
In the worst case, this would prevent scaleup of new nodes with any
36+
differences incompatible with existing MachineConfigs.
37+
38+
For these users, their only real option today would be to re-provision their
39+
cluster with new install-time configuration, which is costly and time
40+
consuming. We would like to introduce the ability for users in these scenarios
41+
to instruct the MCO to allow for unreconcilable MachineConfig changes to be
42+
applied, skipping existing nodes, and serve this configuration to new nodes
43+
joining the cluster. Invalid ignition is not considered in this case.
44+
45+
### User Stories
46+
47+
* As a cluster admin, I am adding new nodes to a long-standing cluster and I
48+
would like change the partitions schema for the new nodes.
49+
* As a cluster admin, I am adding new nodes to a long-standing cluster and the
50+
new hardware has a different set of disks that requires different disks and
51+
filesystems sections.
52+
53+
### Goals
54+
55+
* Allow users to provide MCs with non-reconcilable fields for the use cases
56+
in which preserving the original install-time parameters values is not
57+
possible.
58+
59+
### Non-Goals
60+
61+
* Allow invalid Ignition/MachineConfig fields to be applied.
62+
* Disable non-reconcilable MCs validation by default.
63+
64+
## Proposal
65+
66+
Update the MachineConfiguration CR by adding a new field to the spec that
67+
allows users to bypass validation for irreconcilable MachineConfig changes.
68+
The field will default to the current behavior that is to validate all
69+
rendered MCs.
70+
71+
The MachineConfig Controller and the MachingConfig MachineConfigDaemons will
72+
read in runtime the new field and if the value explicitly states that the
73+
validation should be skipped they will let the MachineConfig pass and get
74+
applied to the nodes.
75+
76+
MachineConfig daemons will continue to perform the already supported updates
77+
to nodes, no matter if the non-reconcilable validation is skipped or not.
78+
Already existing nodes that receives only non-supported changes will skip the
79+
update and will be considered updated.
80+
81+
### Workflow Description
82+
83+
Each time a MachineConfig is changed a new rendered MachineConfig is created
84+
for each pool associated to the changed MachineConfig.
85+
86+
Before the freshly created rendered MachineConfig passes to the
87+
MachineConfigDaemons the MCO performs an internal validation of it that can
88+
can be divided into three phases:
89+
90+
1. Parse the Ignition raw configuration.
91+
2. Ensure the Ignition configuration is valid.
92+
3. Ensure there are no changes to non-reconcilable fields.
93+
94+
The first two steps are self-explanatory and are not covered by this
95+
enhancement, as the MCO will always perform them. The third one, the
96+
validation of non-reconcilable fields, is the main target of this enhancement.
97+
98+
After the Ignition validation is done, the non-reconcilable fields validation
99+
is performed or skipped based on the proposed
100+
`machineConfigurationValidationPolicy` field in the MachineConfiguration CR.
101+
If the field is set to `Relaxed` the non-reconcilable fields validation is
102+
skipped, otherwise is performed.
103+
104+
The non-reconcilable MachineConfig validation remains as it is with this
105+
enhancement, as the [implementation](https://github.com/openshift/machine-config-operator/blob/e44d380686aee42f784a277236dbac49b083441e/pkg/controller/common/reconcile.go#L69)
106+
does not change with this enhancement.
107+
108+
After the validation checks are done the nodes of the MCP are updated to point
109+
to the new rendered MachineConfig as the desired one and the MCD starts to
110+
apply the requested changes.
111+
112+
The MCD only applies changes to the supported fields, any change to the
113+
MC out of supported ones is ignored.
114+
115+
After all the changes are applied, the MCD updates the Node annotations to
116+
point `machineconfiguration.openshift.io/currentConfig` to the updated one and
117+
`machineconfiguration.openshift.io/state` to `Done` if the update succeeds.
118+
119+
### API Extensions
120+
121+
- Update the MachineConfiguration CRD to add an enumeration field, called
122+
`machineConfigurationValidationPolicy` that is used as the validation
123+
skipping toggle. The field does not set a default values to let the MCO pick
124+
what to do in the default case. The enumeration has only two values:
125+
- Strict: Validation is always performed. This is the value the MCO will
126+
use as default.
127+
- Relaxed: The validation of non reconcilable fields is skipped and only
128+
the Ignition syntactic validation will be done.
129+
130+
### Risks and Mitigations
131+
132+
By setting `machineConfigurationValidationPolicy` to `Relaxed` the customer
133+
acknowledges that providing MCs that make use of Ignition features out of the
134+
scope of the MCO will lead to cluster with nodes using different Ignition
135+
configurations.
136+
137+
### Drawbacks
138+
139+
None.
140+
141+
## Design Details
142+
143+
### Open Questions [optional]
144+
145+
None.
146+
147+
### Test Plan
148+
149+
MCO e2e tests and unit tests will cover this functionality.
150+
151+
### Graduation Criteria
152+
153+
This feature is behind the tech-preview FeatureGate in 4.20.
154+
Once it is tested by QE and users it can be GA'd since it should not impact
155+
daily usage of a cluster.
156+
157+
## Dev Preview -> Tech Preview
158+
159+
Not applicable. Feature introduced in Tech Preview.
160+
161+
## Tech Preview -> GA
162+
163+
Bugs found by e2e tests and QE are .
164+
165+
#### Removing a deprecated feature
166+
167+
### Upgrade / Downgrade Strategy
168+
169+
Upgrades or downgrades are not impacted by the presence or not of this feature.
170+
171+
### Version Skew Strategy
172+
173+
Not applicable.
174+
175+
### Operational Aspects of API Extensions
176+
177+
#### Failure Modes
178+
179+
If the non-reconcilable configuration validation is performed and it fails
180+
the MCO continues to report the failure as it is alraedy doing in the MCP, by
181+
setting to the MCP the `RenderDegraded` condition to true.
182+
183+
If the configuration reaches the MCD and the non-reconcilable validation
184+
fails the MCN `UpdatePrepared` condition is updated with the details of the
185+
validation failure.
186+
187+
#### Support Procedures
188+
189+
None.
190+
191+
## Implementation History
192+
193+
Not applicable.
194+
195+
## Alternatives (Not Implemented)
196+
197+
Not applicable.

0 commit comments

Comments
 (0)