Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unregisteredtaint controller is not necessary #135

Open
jxs1211 opened this issue Nov 20, 2024 · 5 comments
Open

unregisteredtaint controller is not necessary #135

jxs1211 opened this issue Nov 20, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@jxs1211
Copy link
Contributor

jxs1211 commented Nov 20, 2024

What happened:
unregisteredtaint controller want to delete the unregistered taint here
pkg/controllers/nodeclaim/unregisteredtaint/controller.go

		nodeCopy := node.DeepCopy()
		// remove the unregistered taint
		nodeCopy.Spec.Taints = lo.Reject(nodeCopy.Spec.Taints, func(item corev1.Taint, index int) bool {
			return item.MatchTaint(&v1.UnregisteredNoExecuteTaint)
		})

if the node has unregisteredtaint, the taint will be deleted in registration process.
https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/nodeclaim/lifecycle/registration.go#L97

So, the controller is not necessary here.

What you expected to happen:
delete the controller or removing the related code

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Karpenter-provider-alibabacloud version (use git describe --tags --dirty --always): 9ef9410
  • AliCloud ACK version: N/A
  • Others: N/A
@jxs1211
Copy link
Contributor Author

jxs1211 commented Nov 20, 2024

@jwcesign

@peng19940915
Copy link
Contributor

taint := karpv1.UnregisteredNoExecuteTaint

UnregisteredNoExecuteTaint is injected by UserData, If the data in the node's UserData is not executed in time, UnregisteredNoExecuteTaint will not be attached to the node.
It is very likely that the node has already completed the 'Registration' phase at this point (since it is not guaranteed that 'Registration' will only end after UserData has been executed). Once UserData has been executed, 'UnregisteredNoExecuteTaint' is attached to the node, triggering the controller, which then enters the Init phase.
If the Init phase detects 'UnregisteredNoExecuteTaint', it will directly reject any subsequent actions, keeping the node in an unusable state, even if it is already Ready.
The node will remain in this state unless the taint is manually removed.

@jwcesign jwcesign added the enhancement New feature or request label Nov 27, 2024
@jxs1211
Copy link
Contributor Author

jxs1211 commented Jan 14, 2025

taint := karpv1.UnregisteredNoExecuteTaint

UnregisteredNoExecuteTaint is injected by UserData, If the data in the node's UserData is not executed in time, UnregisteredNoExecuteTaint will not be attached to the node.
It is very likely that the node has already completed the 'Registration' phase at this point (since it is not guaranteed that 'Registration' will only end after UserData has been executed). Once UserData has been executed, 'UnregisteredNoExecuteTaint' is attached to the node, triggering the controller, which then enters the Init phase.
If the Init phase detects 'UnregisteredNoExecuteTaint', it will directly reject any subsequent actions, keeping the node in an unusable state, even if it is already Ready.
The node will remain in this state unless the taint is manually removed.

So my understanding is the Registration may happen before UserData's execution, if so, the taint can't be removed and cause the node state to be unknown, so the unregisteredtaint controller is necessary to guarantee the taint to be removed.

I checked the code, the taint will be added to the node in the launch phase, then it should be removed in the registration phase, from the order's perspective it should work.

the registered phase will check the taint's existence, if not it will be reconciled, until the controller saw the taint and remove it, which means the taint added through the userData will be checked periodically and removed eventually, so I think it should work as expected.

	// check if sync succeeded but setting the registered status condition failed
	// if sync succeeded, then the label will be present and the taint will be gone
	if _, ok := node.Labels[v1.NodeRegisteredLabelKey]; !ok && !hasStartupTaint {
		nodeClaim.StatusConditions().SetFalse(v1.ConditionTypeRegistered, "UnregisteredTaintNotFound", fmt.Sprintf("Invariant violated, %s taint must be present on Karpenter-managed nodes", v1.UnregisteredTaintKey))
		return reconcile.Result{}, fmt.Errorf("missing required startup taint, %s", v1.UnregisteredTaintKey)
	}

@jwcesign
Copy link
Contributor

the taint added through the userData will be checked periodically and removed eventually,

If it works in this way, this controller is not necessary.
I think we need to reproduce the bug which @peng19940915 meet(maybe try many times of node creation), if can't, we can remove this controller.

@jxs1211
Copy link
Contributor Author

jxs1211 commented Jan 16, 2025

the taint added through the userData will be checked periodically and removed eventually,

If it works in this way, this controller is not necessary. I think we need to reproduce the bug which @peng19940915 meet(maybe try many times of node creation), if can't, we can remove this controller.

I notice the userData will be resolved when the script can be get from cache, but I didn't find the cache adding operation is, does that could cause the issue?

	if cachedScript, ok := p.cache.Get(p.clusterID); ok {
		return p.resolveUserData(cachedScript.(string), labels, nodeClaim, kubeletCfg), nil
	}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants