-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cpu/memory requests and limits #1819
Conversation
(converted to Draft) |
315bc3f
to
c9debae
Compare
dlb plugin dsa plugin fpga plugin gpu plugin iaa plugin qat plugin sgx plugin Based on this, I set the values in the yaml files. Please review.!
|
c9debae
to
74fb954
Compare
I set If you think it is fine, let me update the test files.! |
@eero-t had this suggestion: |
I 'rounded them up" and made them to be minimum 40m. Didn't he mean that? Do you think he meant to "add up" 40m-100m more? |
I meant that if 2x CPU usage is <100m, just use 100m, because it's better for the limits to be (clearly) larger than needed, that process being needlessly throttled. Memory throttling means paging, which means that perf of all processes in that node will suffer in effort to keep given container's memory usage within some artificial limit. |
Another reason for having higher limits is that plugins resource usage is not going to be measured and limits updated for every release. Therefore there should be plenty of extra space in case plugin size / resource usage increases due to added features (GPU plugin is largest, so it can act as guideline how large they could grow). Scheduler is not going to block additional workloads going to a node usage because it has couple of pods requesting smallish fraction of single core. :-) |
It would be good to do this anyways to get the e2e tests going. It would also be useful to come up with a simple test that tries to stress the plugins a bit. |
74fb954
to
8f576fc
Compare
@eero-t Thanks for the explanation.! So, in conclusion:
I thought increasing |
Not that whether |
umm, @mythi what do you think? And,, could anyone give help for adding the testcase for the /pkg/controllers//controller_test.go? |
What's the question?
We have examples under test/e2e: intel-device-plugins-for-kubernetes/test/e2e/qat/qatplugin_dpdk.go Lines 185 to 188 in e578ab9
|
Isn't Eero pointing out that limits and requests are different and they need to be equal for QoS? |
My point was about QoS is wanted for the plugin pods? If it's "Guaranteed", then limits & requests should be equal (and CPU & memory requests should be specified for all containers, if Pod has multiple ones). |
@eero-t Thanks! Mikko said they do not need to be |
I thought about |
@mythi Umm, we are adding the cpu/memory Or, should we give possibility for users of the operator so the |
I did not propose/mean this. I wrote that if we select this value carefully perhaps we are safe (but we would still set |
I thought you meant that we do not need limits since it is enough to set requests.
I think I still do not get your point. We already set requests and limits safely, and you said about
'These' means about the part 'giving possibility to users', right? |
But the plugin pods can still be evicted, right? My question is how likely problem that is with the current settings. Also, have you tested what happens to running workloads that are using devices in case a plugin pod gets evicted. |
I have not tested about what happens to the workloads when the pod gets evicted.
If not, what would be the way to make the pod evicted? But to be honest, I do not see the matter of pods getting evicted is related to this PR... No one cared about the matter of pods getting evicted though they have been the least priority until now... |
Yes, according to pod k8s QoS doc, a best-effort pod (earlier plugins) will be evicted before burstable ones (this PR). I assume this to be case even when latter pod exceeds it's resource request. However, if pod exceeds its memory limit, it gets OOM killed by kernel & restarted by k8s scheduler (if its spec allows restart), and eventually scheduler backs off from restarting it. Regardless of whether node is short on resources or not. Something like that could happen if there's a condition that triggers plugin memory leak, so it may be relevant to check. |
they were still controllable through |
Thanks for the explanations.!
Wouldn't this happen even when the pods QoS is guaranteed? If yes... what do you suggest now? I am lost. Just, let me know what I need to do next.! |
I think we all agreed that additional testing is still needed, especially in node cpu/memory pressure cases. It could be implemented as an e2e test also as suggested earlier. |
Yes. But not for best-effort containers, if node has free resources. Node running out of resources is a case, where many other things could also fail, so it's hard to predict what happens in that case with best-effort containers, so testing such condition was IMHO less of a priority. Whereas with specified limits, it's clear what happens, when and to which pod, so testing that could make more sense.
Testing what happens when limits are crossed.. Startup Issue when new version will cross limits already at startup, can be tested by setting limits too low. But startup failures are not really interesting, as then there's no chance for workloads to request devices provided by the plugin. Run-time I do not think kernel allows lowering memory cgroup limits below container's current usage. Which means that container resource utilization would need to be increased at run-time, instead of limits being lowered. Because current containers include just static plugin binary, not any tools, their allocation cannot cannot be easily increased by This could be done either by adding some trivial tool for allocating + dirtying (writing to) given amount of memory, or by adding such option to plugin:
(That option should do just that, not any of the other plugin functionality, to avoid messing up things otherwise.) ...and using that at suitable moment in test:
|
IMHO such changes and test(s) verifying that workloads behave as expected [1] would belong to another PR though. [1] Workloads that already got a device, still work, but new workloads do not get devices until plugin has successfully restarted. |
if this is the case, then my main concerns is covered. I was wondering if a |
AFAIK scheduler considers extendended resources only when pod is scheduled, and kubelet plugin can do something when the container is created (e.g. assign different device instead), but after that pod is completely on its own in this regard. |
From my side, I'm fine merging once @hj-johannes-lee confirms enough (manual) testing is done |
@tkatila Thank you so much. He is helping me with this matter now.! |
I verified that when a plugin is removed, the workload utilizing the resource will stay on. If another workload is scheduled, it will go into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my limited testing, I'd say these current limits are good to go.
@tkatila thanks! @hj-johannes-lee the two commits belong to one so please squash the commits and rebase |
Operator maturity level 3 requires cpu/memory requests and limits for operands. Add them to all plugins deployed by operator Signed-off-by: Hyeongju Johannes Lee <[email protected]>
eb88c82
to
3b08a90
Compare
Operator maturity level 3 requires cpu/memory requests and limits for operands. Add them to all plugins deployed by operator.