Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unfortunte Dev Disaster because of Eraser #1107

Closed
1 task
116davinder opened this issue Nov 11, 2024 · 3 comments
Closed
1 task

Unfortunte Dev Disaster because of Eraser #1107

116davinder opened this issue Nov 11, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@116davinder
Copy link

What kind of request is this?

Improvement of existing experience

What is your request or suggestion?

Please include some defaults for image removal skips since, it removed very important image in my case and now, I am wondering, how to fix it without distorying my worker nodes.

Environment: AWS EKS (1.31)
Worker: Bottlerocket Latest at the time of writing

Default Eraser Rule Log
{"level":"info","ts":1731308363.6721404,"logger":"collector","msg":"no images to exclude"}

Eraser in Action Log
{"level":"info","ts":1731308367.6420007,"logger":"remover","msg":"removed image","given":"sha256:60eb709f2e5c30f4067e605271d1b1bfff0e32f633a5a02f55a74aa448bfafbc","imageID":"sha256:60eb709f2e5c30f4067e605271d1b1bfff0e32f633a5a02f55a74aa448bfafbc","name":{"image_id":"sha256:60eb709f2e5c30f4067e605271d1b1bfff0e32f633a5a02f55a74aa448bfafbc","names":["localhost/kubernetes/pause:0.1.0"]}}

After Eraser
all my deployment/pod startup are stuck like this
image

Are you willing to submit PRs to contribute to this feature request?

  • Yes, I am willing to implement it.
@116davinder 116davinder added the enhancement New feature or request label Nov 11, 2024
@sozercan
Copy link
Member

sozercan commented Nov 11, 2024

@116davinder sorry to hear that. skipping images functionality exists in eraser by setting up exclusions: https://eraser-dev.github.io/eraser/docs/exclusion

Official pause image is from registry.k8s.io/pause. Unfortunately, there's no local default for pause image. I would recommend making the pause image accessible to pull from a registry, and adding it to exclusion list (so it doesn't get pulled every time).

If you can connect to the nodes, you can pull the pause image and retag to what the cluster is looking for, or update the kubelet sandbox image config https://kubernetes.io/docs/setup/production-environment/container-runtimes/#override-pause-image-containerd for mitigation

This is related to #380 that defines pinned images in containerd level. I believe this is closer to what you are looking for.

@116davinder
Copy link
Author

@sozercan , this image local/kubernetes/pause is coming from botterrocket, they do allow setting different pause image but most of clusters are built at the time of this happened so can't change.

Unfortunately, all my worker nodes aka Bottlerocket OS are locked so None can access them :( .

Lastly and luckily later, I found out that I destroyed only one cluster and my team managed to recreate worker nodes to fix it. As of now, I am using exclusion policy to skip the pause image for time being, until, my team moves to registry.k8s.io/pause image.

it is similar to the issue you have mentioned #380. If you like, we can close this issue and keep talking over #380 issue.

@sozercan
Copy link
Member

Closing, we'll track it in #380

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants