Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework % mem use / limit to use memory wss values #511

Open
Carus11 opened this issue Jul 5, 2023 · 1 comment
Open

Rework % mem use / limit to use memory wss values #511

Carus11 opened this issue Jul 5, 2023 · 1 comment

Comments

@Carus11
Copy link

Carus11 commented Jul 5, 2023

If we examine a pod in the current /General/Perf/Container Utilization/% mem used over limit chart we can see the usage as:

image

However if we open the /General/Kubernetes/Compute Resources/Pod graph the usage is much much much less:

image

In
/General/Perf/Container Utilization/Mem
the attribute used is:
container_memory_usage_bytes

In
/General/Kubernetes/Compute Resources/Pod
the attribute used is:
container_memory_working_set_bytes

Overall, we cant use the /General/Kubernetes/Compute Resources/Pod because I want to compare all compute sessions that run overnight overlaid on top of each other, and it would take to long to go through them one by one.

We would instead prefer the /General/Perf/Container Utilization/Mem chart to use container_memory_working_set_bytes instead.

According to A Deep Dive into Kubernetes Metrics — Part 3 Container Resource Metrics | by Bob Cotton | FreshTracks.io: You might think that memory utilization is easily tracked with container_memory_usage_bytes, however, this metric also includes cached (think filesystem cache) items that can be evicted under memory pressure. The better metric is container_memory_working_set_bytes as this is what the OOM killer is watching for.

@gsmith-sas
Copy link
Member

Thanks for reporting this @Carus11. We ship a number of Grafana dashboards obtained by a number of sources. The Kubernetes/Compute Resources/Pod dashboard is one pulled from the Grafana community. The Perf/Container Utilization/Mem dashboard (and the other Perf/* dashboards) were developed by an internal testing team here at SAS focused on performance. As you have discovered, different dashboards will surface different metrics and depending on what you are trying to do or understand, some dashboards will be better for some use-cases.

Unfortunately, due to resource constraints, we haven't been able to research and document where each dashboard is most useful and/or make all of the improvements to them we would like. However, we welcome feedback from people using the dashboards in real-world situations like yourself on which dashboards and metrics are the most useful or where there might be opportunities for improvements. That will help us prioritize the changes we do make.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants