Cleaning node disk space? #1325

bdlink · 2022-08-31T18:23:57Z

bdlink
Aug 31, 2022
Collaborator

I have a cluster that has been running for a while (608 days) and I noticed recently some odd disk usage. I sized the cluster as per the documents of the time and gave each node 120GB of disk space (control plane 32GiB memory, 4 cores, workers 128GiB memory, 8 cores). Ceph-rook disks are on top of this, as is the registry.

I have had the issue with 4.11 of memory/CPU leak and have to reboot some of the nodes every few days.

When looking at the memory leak issue, I noticed that the actual usage of disk space was perhaps abnormal and varying from node to node. This is most obvious on the control plane nodes which are using 89 GiB, 87 GiB and 21 GiB. There is no correlation with the disk usage and the number of running pods.

It seems that the control planes, at least, should have the same disk usage.

A quick search of the docs did not uncover an easy way of pruning any disk space that is not currrently needed. Any ideas, before I dig more deeply as to a way of cleaning up any wasted space on the nodes?

Thanks

mburke5678 · 2022-08-31T19:09:41Z

mburke5678
Aug 31, 2022
Collaborator

Will these work? Pruning objects to reclaim resources <https://docs.okd.io/4.11/applications/pruning-objects.html> Freeing node resources using garbage collection <https://docs.okd.io/4.11/nodes/nodes/nodes-nodes-garbage-collection.html>

…

On Wed, Aug 31, 2022 at 2:24 PM Bruce Link ***@***.***> wrote: I have a cluster that has been running for a while (608 days) and I noticed recently some odd disk usage. I sized the cluster as per the documents of the time and gave each node 120GB of disk space (control plane 32GiB memory, 4 cores, workers 128GiB memory, 8 cores). Ceph-rook disks are on top of this, as is the registry. I have had the issue with 4.11 of memory/CPU leak and have to reboot some of the nodes every few days. When looking at the memory leak issue, I noticed that the actual usage of disk space was perhaps abnormal and varying from node to node. This is most obvious on the control plane nodes which are using 89 GiB, 87 GiB and 21 GiB. There is no correlation with the disk usage and the number of running pods. It seems that the control planes, at least, should have the same disk usage. A quick search of the docs did not uncover an easy way of pruning any disk space that is not currrently needed. Any ideas, before I dig more deeply as to a way of cleaning up any wasted space on the nodes? Thanks — Reply to this email directly, view it on GitHub <#1325>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAKSQ2URLX5WGQX3KN232TV36PM3ANCNFSM6AAAAAAQBT74W4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Michael Burke HE / HIM / HIS Technical Writer, Customer Content Services Red Hat <https://www.redhat.com> 314 Littleton Rd Westford, MA 01886 ***@***.*** <https://red.ht/sig>[image: Logo-Red_Hat-Native_and_Indigenous-B-Full_Color_Standard-CMYK.jpg] <https://source.redhat.com/communitiesatredhat/diversity_and_inclusion/native>

0 replies

vrutkovs · 2022-09-01T07:08:02Z

vrutkovs
Sep 1, 2022
Maintainer

When looking at the memory leak issue, I noticed that the actual usage of disk space was perhaps abnormal and varying from node to node. This is most obvious on the control plane nodes which are using 89 GiB, 87 GiB and 21 GiB. There is no correlation with the disk usage and the number of running pods.

This is expected. Kubelet starts cleaning unused images when disk usage reaches 90% by default, so current disk free depends on image history. Michael has linked to the docs pages if you want to tweak image GC parameters.

0 replies

bdlink · 2022-09-01T17:44:03Z

bdlink
Sep 1, 2022
Collaborator Author

Thanks, Michael and Vadim.

I had earlier looked at the pruning objects page, but it did not seem useful as I did not have many completed pods. I had not consulted the GC page. On how GC works, there is also this RH blog, https://cloud.redhat.com/blog/image-garbage-collection-in-openshift

None of the above discuss manual image removal, or getting a list of unused images on a node via oc or kubectl, which is possible for pods (and from the console you can see the difference between total number of pods and number of running pods). A GC dryrun would also be useful. to see how much of the disk pressure is a non-issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaning node disk space? #1325

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Cleaning node disk space? #1325

bdlink Aug 31, 2022 Collaborator

Replies: 3 comments

mburke5678 Aug 31, 2022 Collaborator

vrutkovs Sep 1, 2022 Maintainer

bdlink Sep 1, 2022 Collaborator Author

bdlink
Aug 31, 2022
Collaborator

mburke5678
Aug 31, 2022
Collaborator

vrutkovs
Sep 1, 2022
Maintainer

bdlink
Sep 1, 2022
Collaborator Author