Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit DisruptionBlocked events on affected pod or pdb resource #2016

Open
cnmcavoy opened this issue Feb 20, 2025 · 2 comments
Open

Emit DisruptionBlocked events on affected pod or pdb resource #2016

cnmcavoy opened this issue Feb 20, 2025 · 2 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@cnmcavoy
Copy link
Contributor

Description

What problem are you trying to solve?
As a cluster admin, I use Karpenter's events to understand and triage when disruption is occurring less frequently than expected. Karpenter emits DisruptionBlocked events when a node can not be disrupted, and if it is because of a pod (with a do-not-disrupt annotation) or pdb, the resource and namespace is in the event message:

func Blocked(node *corev1.Node, nodeClaim *v1.NodeClaim, msg string) (evs []events.Event) {
if node != nil {
evs = append(evs, events.Event{
InvolvedObject: node,
Type: corev1.EventTypeNormal,
Reason: events.DisruptionBlocked,
Message: msg,
DedupeValues: []string{string(node.UID)},
})
}
if nodeClaim != nil {
evs = append(evs, events.Event{
InvolvedObject: nodeClaim,
Type: corev1.EventTypeNormal,
Reason: events.DisruptionBlocked,
Message: msg,
DedupeValues: []string{string(nodeClaim.UID)},
})
}
return evs
}

Because the InvolvedObject is the node + nodeclaim, the DisruptionBlocked events always end up in the default namespace, rather than the user's namespace with the pod or pdb. This means that the message has to be parsed by tools in order to extract the namespace from the event, which is burdonsome (and really hurts the ability to index these events in tools like datadog). Either emitting these events on the affected resource, or emitted a second duplicate event on the affected resource would satisfy our use-case.

How important is this feature to you?

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@cnmcavoy cnmcavoy added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 20, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 20, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jonathan-innis
Copy link
Member

Interesting -- what's the problem that you are running into here that causes you to need these events? Are users creating faulty PDBs and then you need to inform them that they are fully blocking the node from being disrupted? Same thing with karpenter.sh/do-not-disrupt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

3 participants