Node Wizard is a controller that monitors node readiness. It cordons and drains nodes that are not ready, and uncordons them when they become ready again.
The purpose of Node Wizard is to automate the response to nodes entering the "NotReady" state. It evacuates workloads from the affected node and cordons it off until it becomes ready again. Automating this process offers a faster response time compared to waiting for human intervention. The controller instantly reacts, cordons off the node, evacuates the workloads, and reschedules them on other nodes with minimal downtime.
Additionally, Node Wizard accounts for cases where the node may recover on its own over time. In such situations, there may not be an immediate urgency, allowing for investigation at a later time. When the node becomes ready again, the controller automatically uncordons it.
There are several features that Node Wizard offers:
-
Drain
: Non-graceful draining parameters can be set via an environment variable. -
Uncordon
: The node will be uncordoned when it is ready. -
Ignore Some Nodes
: Some nodes can be ignored by the controller by labeling withnode-wizard/ignore=true
(it can be useful for the ready nodes but some maintenance is going on). -
Leader Election
: Application uses leader election mechanism. This is useful for high availability. -
Metrics
: As now, two metrics are exposed:Metric Name Metric Type Description node_wizard_uncordon_count
Counter Counter metric that shows the number of uncordon operations performed for each node. node_wizard_drained_count
Counter Counter metric that shows the number of drain operations performed for each node.
Time to uncordon
: Time to uncordon feature is planned to be added in the future.Time to cordon
: The default node monitor grace period is 40 seconds. As this is quite a long time, the Node Wizard does not wait by default. However, this feature can be added in the future.
# to add the Helm repository
helm repo add cnwizards https://charts.cloudnativewizards.dev
# to install the Helm charts
helm install node-wizard cnwizards/node-wizard --namespace node-wizard --create-namespace