-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace kill_all_container_shims with remove_all_containers even in classic #4693
base: master
Are you sure you want to change the base?
replace kill_all_container_shims with remove_all_containers even in classic #4693
Conversation
Hello. Make sure to sign the CLA. I've been looking a bit into this, and experimented a bit with what's being reported here and proposed. I got different results, so, it might be useful to know what is your setup (node OS, microk8s version, etc.), so we can compare our results. Here are my setup and my findings. I've ran these tests in Hyper-V VMs (Windows 11 Pro), using Ubuntu 22.04 nodes, and I've been trying out v1.31 (including building the snap using this PR):
As for my results, I first tried the
I've then tried the
From the looks of it, Another interesting note here, is that it seems that
As mentioned, I've also built the snap myself using this PR and tried it out. I suspected that, because of the changes proposed here, the
As for Best regards, Claudiu |
Got here due to observing that the The kubelite proces and some others were stopped/killed, but that did not stop the containers. I have a DIY memory/cpu resource consumption monitor running (a python script running as a Daemonset) which kept reporting memory consumption, showing a large drop when the I'm running the snap in Happy to provide any requested debugging info.
|
If you try with microk8s Are you able to give it a try by any chance? |
Do you mean 1.31/candidate? I don't see a 1.35.
Op do 31 okt 2024 om 09:48 schreef Gaël Goinvic ***@***.***>
… Got here due to observing that the kubelite process was apparently
leaking memory at a rate of 0.1 GB/day. When running microk8s stop, as
expected, the kubelite process stopped, but I can still see all services
(argocd, calico, ...) running in the background in top.
The kubelite proces and some others were stopped/killed, but that did not
stop the containers. I have a DIY memory/cpu resource consumption monitor
running (a python script running as a Daemonset) which kept reporting
memory consumption, showing a large drop when the kubelite process went
down and then showing a slight increase again when I ran microk8s start a
minute ago.
I'm running the snap in --classic mode. This is on microk8s 1.30.5,
running on bare metal ubuntu-server 24.04.01.
Happy to provide any requested debugging info.
$ sudo systemctl kill snap.microk8s.daemon-kubelite.service --signal=SIGKILL
Failed to kill unit snap.microk8s.daemon-kubelite.service: Unit snap.microk8s.daemon-kubelite.service not loaded.
$ sudo systemctl kill snap.microk8s.daemon-containerd.service --signal=SIGKILL
Failed to kill unit snap.microk8s.daemon-containerd.service: Unit snap.microk8s.daemon-containerd.service not loaded.
[image: image]
<https://private-user-images.githubusercontent.com/3942301/381695770-ce448ab4-c180-4cfd-9de8-655ecd38caee.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzAzNjQ2ODksIm5iZiI6MTczMDM2NDM4OSwicGF0aCI6Ii8zOTQyMzAxLzM4MTY5NTc3MC1jZTQ0OGFiNC1jMTgwLTRjZmQtOWRlOC02NTVlY2QzOGNhZWUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MTAzMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDEwMzFUMDg0NjI5WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YzUyOTNlNmE0YjJkNmY5ZWMzYmVkZTAwYTg4MGMyMjE3YjA5NTE4NjQwOWZkZDgzNjUyMDM0YWNiYTE1YTlkYyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.zDJ7r0XjpKG0dULp1E8pI3xQn3h0j_p-Ws-sW8OQ2fk>
If you try with microk8s 1.35/candidate, which has the #4710
<#4710> backported, it should
just work just as @claudiubelu <https://github.com/claudiubelu> explained.
Are you able to give it a try by any chance?
—
Reply to this email directly, view it on GitHub
<#4693 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6CPHPVJDWXQ32W5NHWSO3Z6HVEHAVCNFSM6AAAAABPHYNXDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBZGMZTQMRUHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes sorry, typo : |
Not seeing much difference unfortunately after upgrading one node to 1.30.6.
I tried once before getting this 'pretty' output so that's why the pod reports running for 52 seconds. It restarts once While typing this I got the idea to check from another node what the status is - and it correctly (but to me unexpectedly) reports a few pods still running on "stopped" node lenovo-01.
Trying to drain the node hanged for ~3 minutes until I cancelled it, this completed in ~45 seconds when draining the ("not
After I then thought "maybe because my mqtt app is running as a DaemonSet", as I saw these kept running normally after draining a node anyway. However the argocd ReplicaSet, which terminates with a
and the pods are visible from the lenovo-02 node:
After restarting the lenovo-01, as before the pods all terminate and get restarted. Interestingly the DaemonSet pods truely 'restart' (as registered by the
|
To be honest, not sure how the #4710 would do anything for this in my case - it seems to fix conflicting PATHs but I've never upgraded before today so I do not have an 'old snap' to conflict with anything, the microk8s install I did was using the ubuntu server wizard. That's also why I'm on 1.30 instead of 1.31, which I see now had been out for a while already when I first installed. |
This seems very worrisome. I don't have a multi-node cluster handy. I think both #4691 and #3969 should be re-opened as there are clearly still issues with @claudiubelu I'm a bit uncertain about why you can't reproduce the issue we are seeing here. Are you able to give it more try, with the classic version? Is it somehow getting better between v1.30 and v1.31 (but I don't see why?) Also, I'm not sure why the
|
Hello, Addressing things as I go through them.
Just a quick note here. From what I can see, As for minimizing your memory leak issue, there might be something you can do right now, depending on your environment. For example, if you have a multinode cluster, you can have 3 nodes joined as a control plane (for HA), and any other node above that to join only as worker nodes. As it can be seen in [1] microk8s/build-scripts/components/kubernetes/patches/v1.27.0/0000-Kubelite-integration.patch Line 11 in e555997
[2] kubernetes/kubernetes#122725 [3] kubernetes/kubernetes#126962
Another small note here. Stopping / killing kubelite is unrelated to containerd and its containers. It's an entirely separate, independent process from it. Basically, the Now, going a bit further to the containerd level. Technically, stopping the Also, another important note here: if you want to kill all the containers, you have to stop the containerd service, and kill it:
The reason is simple: if you stop the service, the containers will keep running as previously mentioned; if you only kill the service, due to the
Those are some weird error messages, and it's probably why the
That error goes away if I fix the mistakes and run
Hmm, a bit of a wild guess here, but are you just checking whether a Pod is
They're in
The output above also includes an nginx deployment, which was created with the following command:
Now, on
This happens because Kubernetes reports what it actually thinks the state of world is, and relies on services such as
As you can see, there's a new nginx Pod running on a different node, and the old Pod is stuck in a On the
This issue could probably be resolved by the proposal made in this PR (I'd have to test), but this PR would instead leak containerd and containerd-shims as previously mentioned, which isn't great either. So, here's an idea: how about we call both
I might not be understanding this properly, it isn't clear to me: were you trying to drain a node which was stopped? (with
Ye, daemonsets are not drained / evicted, since by definition they're meant to run as one instance per node that respects the scheduling limits imposed on it (labels, taints, tolerations).
That's because they haven't been restarted, but they've been recreated instead. That's why you'll see that those Pods are now young (their age is 2m17s and 24s respectively). And that's pretty normal. Pods / containers (or more precicely, "stateless" containers) are typically meant to be ephemeral in the first place, losing one and getting another to replace it should be a trivial matter with no consequence. May be a different story for
I only checked against containerd and containerd-shims, and did not include the binaries executed in the containers in my analysis. That's my bad. Will try a variant in which both Best regards, Claudiu Belu |
@claudiubelu Thanks, learned a few things from your extensive response :-) I've been using K8s (OpenShift specifically) professioally as a developer for 6 months now, and only recently started experimenting with a local cluster to understand more of what's under the hood. I was not aware that issuing e.g. Regardless, I take it that we agree that issuing |
Hello, I've tried the following changes (on top of this PR): claudiubelu@3e2aba6, and I've built microk8s classic snap and tried it:
There were a few more changes than I originally thought, but it seems that there are no more leaked containers / containerd shims with the changes I've mentioned above. The change also contains an additional test which checks that Even with my changes, on upgrade (snap refresh), the previous containers will still continue running. I haven't treated the upgrade scenario, and I don't think we should. In the best case scenario, an upgrade should not impact the workload Pods / containers and their uptime, thanks to the fact that
You're welcome. :) Regarding the 'cached response' part... I agree, that can be misleading at first, but it's a reasonable implementation, if you think about it: you can easily have hundreds of Pods deployed even in a small cluster, and checking the actual state of each and every single one of them (+ if you consider a reasonable timeout for each Pod)... it would take forever to run something like Best regards, Claudiu Belu |
Summary
Current behaviour is very dangerous : we've seen data corruption and file locks issues after microk8s upgrades because containers processes become duplicated. (Processes before upgrade are still present, and upgraded microk8s starts new containers writing on the same resources).
Changes
kill_all_container_shims
withremove_all_containers
, regardless ofclassic
orstrict
confinementTesting
microk8s stop
command patched and asserted that no containers were present anymore.Before :
After :
Possible Regressions
Checklist
Notes