Unfortunte Dev Disaster because of Eraser #1107

116davinder · 2024-11-11T07:35:45Z

What kind of request is this?

Improvement of existing experience

What is your request or suggestion?

Please include some defaults for image removal skips since, it removed very important image in my case and now, I am wondering, how to fix it without distorying my worker nodes.

Environment: AWS EKS (1.31)
Worker: Bottlerocket Latest at the time of writing

Default Eraser Rule Log
{"level":"info","ts":1731308363.6721404,"logger":"collector","msg":"no images to exclude"}

Eraser in Action Log
{"level":"info","ts":1731308367.6420007,"logger":"remover","msg":"removed image","given":"sha256:60eb709f2e5c30f4067e605271d1b1bfff0e32f633a5a02f55a74aa448bfafbc","imageID":"sha256:60eb709f2e5c30f4067e605271d1b1bfff0e32f633a5a02f55a74aa448bfafbc","name":{"image_id":"sha256:60eb709f2e5c30f4067e605271d1b1bfff0e32f633a5a02f55a74aa448bfafbc","names":["localhost/kubernetes/pause:0.1.0"]}}

After Eraser
all my deployment/pod startup are stuck like this

Are you willing to submit PRs to contribute to this feature request?

Yes, I am willing to implement it.

The text was updated successfully, but these errors were encountered:

sozercan · 2024-11-11T18:35:01Z

@116davinder sorry to hear that. skipping images functionality exists in eraser by setting up exclusions: https://eraser-dev.github.io/eraser/docs/exclusion

Official pause image is from registry.k8s.io/pause. Unfortunately, there's no local default for pause image. I would recommend making the pause image accessible to pull from a registry, and adding it to exclusion list (so it doesn't get pulled every time).

If you can connect to the nodes, you can pull the pause image and retag to what the cluster is looking for, or update the kubelet sandbox image config https://kubernetes.io/docs/setup/production-environment/container-runtimes/#override-pause-image-containerd for mitigation

This is related to #380 that defines pinned images in containerd level. I believe this is closer to what you are looking for.

116davinder · 2024-11-12T17:37:10Z

@sozercan , this image local/kubernetes/pause is coming from botterrocket, they do allow setting different pause image but most of clusters are built at the time of this happened so can't change.

Unfortunately, all my worker nodes aka Bottlerocket OS are locked so None can access them :( .

Lastly and luckily later, I found out that I destroyed only one cluster and my team managed to recreate worker nodes to fix it. As of now, I am using exclusion policy to skip the pause image for time being, until, my team moves to registry.k8s.io/pause image.

it is similar to the issue you have mentioned #380. If you like, we can close this issue and keep talking over #380 issue.

sozercan · 2024-11-13T19:41:28Z

Closing, we'll track it in #380

116davinder added the enhancement New feature or request label Nov 11, 2024

sozercan closed this as completed Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unfortunte Dev Disaster because of Eraser #1107

Unfortunte Dev Disaster because of Eraser #1107

116davinder commented Nov 11, 2024

sozercan commented Nov 11, 2024 •

edited

Loading

116davinder commented Nov 12, 2024

sozercan commented Nov 13, 2024

Unfortunte Dev Disaster because of Eraser #1107

Unfortunte Dev Disaster because of Eraser #1107

Comments

116davinder commented Nov 11, 2024

What kind of request is this?

What is your request or suggestion?

Are you willing to submit PRs to contribute to this feature request?

sozercan commented Nov 11, 2024 • edited Loading

116davinder commented Nov 12, 2024

sozercan commented Nov 13, 2024

sozercan commented Nov 11, 2024 •

edited

Loading