Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradual Increase in Memory Consumption #653

Open
J0ram opened this issue Sep 11, 2024 · 4 comments
Open

Gradual Increase in Memory Consumption #653

J0ram opened this issue Sep 11, 2024 · 4 comments

Comments

@J0ram
Copy link

J0ram commented Sep 11, 2024

Hi,

Since 1.4.8 we have observed our shell-operator pods slowly consuming memory over time:
image

I made a local branch with pprof installed and it appears to be logrus that is not releasing its memory:
image

Environment:

  • 1.48 - 1.4.11 (we've tested each version on release)
  • Kubernetes version: AKS 1.29.4
  • Installation type Helm

Worth noting that 1.4.7 behaves as expected on the same cluster.

Anything else we should know?:
I find it odd that nobody else is reporting this issue - I can only assume it's some oddity in our environment but I'm pretty much out of ideas.

From what I can see the version of the logrus package hasn't changed between versions of this application (particularly 1.47 - 1.48). If you have any ideas of how we could debug further that would be appreciated.

I've attached the heap dump if that's of any help

Thanks

heap.zip

@vladimirfx
Copy link

Hit by this issue. Tryed to set GOMEMLIMIT with no luck (then checked Go version = 1.19 which does not support a soft memory limit).

Shell Operator: 1.4.12
K8s: 1.30.3
Linux Kernel: 6.6.52 with THP enabled in madvise mode (it is relevant for Go > 1.20 I think)

Reproducer project: https://github.com/cit-consulting/hetzner-failoverip-controller

@sidineyc
Copy link

sidineyc commented Oct 2, 2024

Also hitting this.

Shell Operator: 1.4.10
K8s: 1.29.8

@kyale
Copy link

kyale commented Oct 18, 2024

Same here with multiple operators running on different clusters using 1.4.10. Pod crashes and restarts when it hits memory limit.
Screenshot 2024-10-18 at 14 51 25

@vladimirfx
Copy link

Checked 1.4.14 - classic memory leak:

Снимок экрана 2024-10-20 в 16 38 52

Because of Go 1.22 and GOLIMIT, the operator uses a lot of CPU on GC before being killed by Kubelet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants