Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pod status contstants to account for Evicted and ContainerStatusUnknown and leverage them in sanitize #2314

Conversation

j2udev
Copy link
Contributor

@j2udev j2udev commented Nov 24, 2023

I had a cluster that was blowing up with pods that were being Evicted and others that had ContainerStatusUnknown... figured I would try the new sanitize function, but noticed that it wasn't clearing them out. Updated accordingly.

I did notice some weird behavior in that the sanitize function didn't completely clear out everything in one run... (I had roughly 800-1000 failing pods 😱). I had to run it about 3 times to fully get everything. Is that expected?

I've said it before, but this tool is such a game changer. Thanks very much for all the hard work!

@derailed
Copy link
Owner

@j2udev Thank you for you kind words Joshua!
I haven't tried sanitize with that many pods. I think I had ~200 or so in a bad way.
Curious, do you use PriorityClass on your cluster. Wondering if we either get throttled or pod got rescheduled while sanitizing.
Currently we just log failed deletions but going to just stop sanitization if errors occur.
Would be great see the debug logs in the meantime and see if we errored out that would cause the sanitization to stop while in flight. Thank you for the details here!

@derailed derailed added enhancement New feature or request question Further information is requested InProgress Marks an issue has being worked on labels Nov 29, 2023
@j2udev
Copy link
Contributor Author

j2udev commented Nov 29, 2023

@j2udev Thank you for you kind words Joshua! I haven't tried sanitize with that many pods. I think I had ~200 or so in a bad way. Curious, do you use PriorityClass on your cluster. Wondering if we either get throttled or pod got rescheduled while sanitizing. Currently we just log failed deletions but going to just stop sanitization if errors occur. Would be great see the debug logs in the meantime and see if we errored out that would cause the sanitization to stop while in flight. Thank you for the details here!

We do use PriorityClasses, but not heavily. Unfortunately (fortunately?), our cluster has been healthy since then and it's not overly easy to test getting it back into that state... nor do I think my colleagues would appreciate me intentionally trying to put the cluster into that state 😅. If I see it pop up again though, I can grab the debug logs.

@derailed
Copy link
Owner

derailed commented Dec 1, 2023

Lol - Right don't break your cluster on my account. Nw - will keep this in the pipe.
If you or other folks run into this issue. Please add the gory details. Thank you!!

@derailed
Copy link
Owner

@j2udev need TLC

@derailed derailed added the needs-tlc Pr needs additional updates label Dec 26, 2023
@j2udev j2udev force-pushed the feat/account-for-evicted-pods-and-container-status-unknown branch from 0cbfaec to 777394b Compare January 3, 2024 14:34
@j2udev
Copy link
Contributor Author

j2udev commented Jan 3, 2024

Sorry about that got busy with my day job

Copy link
Owner

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j2udev Thanks for the updates Joshua!

@derailed derailed merged commit d63fe83 into derailed:master Jan 4, 2024
3 checks passed
thejoeejoee pushed a commit to thejoeejoee/k9s that referenced this pull request Feb 23, 2024
placintaalexandru pushed a commit to placintaalexandru/k9s that referenced this pull request Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request InProgress Marks an issue has being worked on needs-tlc Pr needs additional updates question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants