-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow default configuration override with annotations #44
Comments
For simplicity, I think annotations should always override the default configuration if present. |
A few thoughts I had on this before I dive into implementation. There are two kinds of environment variables used to configure pod-reaper. One set is specific to the reaper itself (how often it looks at pods, what namespaces it looks at, etc). The other set defines the rules -- these are the things where we can have pods specify their own overrides. Overriding rules on a per pod basis means that those rules wouldn't know anything about the first set of rules (this is particularly important for things like random chance -- where checking more/less often matters) There's another kind of interesting case. Suppose we have a reaper checking every 1 hour for anything that's been running in a specific namespace for > 3 days -- but a pod in that namespace using an annotation to specify an unrelated rule -- like container status is in Right now, I'm thinking of the following:
Now for some fun stuff! What do we want the behaviors to be in the following situations?
So let's say we have Additional thought: this really brings to a light a problem with the chaos chance rule because if it's specified as a pod annotation, then there's a separation between where the chance is defined and how often that chance is applied (the reaper's scheduling) which is just begging for trouble. @andrewwipf and I actually did some white-boarding a couple solutions to that problem (that I think I'd like to implement independently of this feature). |
My initial idea was “no” to keep things simple. However that will probably lead to confusion. I think the following pseudo code will handle all three cases:
|
I agree with that overall approach! |
Worked on this a bit. Doing this in a few steps (in part because I'm a lunatic that likes refactoring). I'll be updating a Originally, with rules only configured at the reaper level, only the rules that were configured were "loaded". This separated the configuration of the reaper (which happened exactly once at startup) from the actual checking of the rules. If we want to enable overrides from things configured outside the reaper itself, rule configuration needs to be more dynamic. To accomplish that, I want to remove the loading phase of this altogether, and instead always be checking each rule -- letting the rule determine if it should be applied or not. This leads to a ternary situation. There are now three cases for a rule evaluation:
We avoided case 3 originally because rules that weren't configured were never loaded.
So far, I think this is going well. Just taking me a longer than I wanted. I really convinced myself that this could be a fun feature when I thought the use case where rather than overriding settings, instead there was a generic reaper running with no-environment variable defined rules. Instead it's an entirely "opt-in with your own per pod rules". One more minor side note: chaos chance does NOT work well with a pod level override. It really needs a time factor to be configured WITH the chance in order to make sense. I have a couple ideas about how to do that, but it'll be separate from this work! |
update: have a working reaper where rules are evaluated on the ternary logic of After all this, adding pod-level overrides will be straightforward! |
Okay. Feeling pretty good about refactoring how rules work in order to help accommodate this request. For right now, the rework should have functional feature parity (I have changed log messages a bit). There are still a few things I want to test out with this, but it should be a total drop-in-place of an older version. From here, the request to add pod-level overrides should be MUCH easier than before. |
#45 has a work-in-progress of a rule refactor, but I feel like I'm hitting a point where "it doesn't feel right" again. For a rules to have multiple sources of values leaves me with a lot of questions about what pod-reaper looks like it should be doing from the multiple perspectives. So I'm going to try to write out a super specific situation and see what people think. Using the example above! Suppose we have the following: Pod reaper is configured with:
A specific pod has the following annotations:
What do we think it should do when it encounters a pod that has been, and is still running (not Evicted) at If I only saw the pod-reaper configuration, I would describe the reaper as "kills pods that have been running for more than 1 day" so I would say "don't kill the pod". If I only saw the pod annotations, I would describe the kill conditions as "the pod is at least 12 hours old AND has been evicted" so I would say "don't kill the pod" If we honor overrides, but only for rules that have been configured (or if we opted to do something like Maybe that "AND" there is what's really causing my confusion here. Maybe the solution here is to have two specific configuration options here:
At this point it sounds like I'm overthinking things... but what I've done for #45 doesn't really make sense for the first of these (which I think is more in line with the topic of this issue). It also doesn't really make sense of the second of these -- but for different reasons. As much as I like the idea of the second case, it really feels like a super specific use case that I shouldn't consider right now. Going to keep thinking about this. |
In general, I like the direction of the refactor. The rules returning a named result instead of a boolean, makes things much clearer what the expected behavior is without having to jump around in the code. I liked
I would add For the Duration example above, since there is an annotation on the pod, I would expect it to be evicted at 12h1m Consider the reverse scenario:
Pod annotations:
In this case, I would expect the pod to be killed in 1 day, even though the default is 12h. I do think explicitly enabling rules would probably lead to the least amount of confusion, but would break backward compatibility with the current configuration. I would release it as v3.0 to indicate that there are breaking changes, and clearly document it. |
Alright. Way more thought than I'd have liked later, here's my plan.
For implementing pod level overrides.
The goal of the last piece is to reduce code duplication and make it easier to work. I'm not exactly sure how that'll look right now, but they idea would be start with a list of all rules, filter out anything not enabled, then provide common methods for getting their configuration value. Something like |
@brianberzins Have you made any more progress on annotations? I just ran into a situation where they would be very useful. I'm willing to lend a hand getting this implemented. |
I need dive back into this! |
I had an immediate need, so I went ahead and implemented this in #64. I'm open to changes. |
I owe you a code review! Hoping I can get to it this evening!
…On Wed, Jan 20, 2021, 9:32 AM Jason Harmon ***@***.***> wrote:
I had an immediate need, so I went ahead and implemented this in #64
<#64>. I'm open to changes.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#44 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAS2AGHVS6GQ54N5QTFHH3TS23ZRJANCNFSM4KUTKXSQ>
.
|
It would be nice to be able to override default pod-reaper settings with annotations.
For example if
MAX_DURATION=1d
, but a pod had the annotationpod-reaper/maxduration: 12h
, then that pod would be reaped in 12 hours.The text was updated successfully, but these errors were encountered: