Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: HA proxy memory runaway on certain rpm based distro's -> Setting maxconn in haproxy config (#15319) #18283

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

timgriffiths
Copy link

Fixes #15319

I propose the best way to fix this issue is by setting a global setting in haproxy config maxconn 4000 you can also fix it by changing the max open file limit in containerd but as this comment points out docker-library/haproxy#194 (comment) this only works as haproxy derives the max number of connections from the max open files on a system which seems like a bit of a bug or at least we should set a max as part of the config.

Setting the default sufficiently large so that this should not be a problem. this could be backported to whichever versions users need as it's a simple haproxy config tweak

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

Copy link

codecov bot commented May 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.07%. Comparing base (3160369) to head (6c35fb1).
Report is 864 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #18283      +/-   ##
==========================================
+ Coverage   45.06%   45.07%   +0.01%     
==========================================
  Files         354      354              
  Lines       47963    47969       +6     
==========================================
+ Hits        21614    21624      +10     
+ Misses      23541    23538       -3     
+ Partials     2808     2807       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jennerm
Copy link

jennerm commented Jun 5, 2024

Hi @timgriffiths
Thanks so much for creating this fix. I am using the Helm chart to install Argo CD and I have been struggling with the issue of the ha-proxy pods getting OOMKilled on start up. I have successfully used your fix in the helm chart but I had to modify it slightly:

redis-ha:
  haproxy:
    extraConfig: |
      global
        maxconn 4096

I had to remove the colons from your original, otherwise the pod failed with an error:

Defaulted container "haproxy" out of: haproxy, config-init (init)
[NOTICE]   (1) : haproxy version is 2.9.4-4e071ad
[ALERT]    (1) : config : parsing [/usr/local/etc/haproxy/haproxy.cfg:85] : unknown keyword 'global:' in 'frontend' section
[ALERT]    (1) : config : parsing [/usr/local/etc/haproxy/haproxy.cfg:86] : unknown keyword 'maxconn:' in 'frontend' section; did you mean 'maxconn' maybe ?
[ALERT]    (1) : config : Error(s) found in configuration file : /usr/local/etc/haproxy/haproxy.cfg
[ALERT]    (1) : config : Fatal errors found in configuration.

Seems like you would need this change in your fix too?

Signed-off-by: Timothy Griffiths <[email protected]>
@timgriffiths timgriffiths reopened this Jun 12, 2024
@timgriffiths
Copy link
Author

Thank you. @jennerm clearly had yaml on the brain when I copied that fix from my environment. It's fixed now.

@Dawnflash
Copy link

I can confirm this fixes the issue for us on AL2023 EKS nodes.

@pre
Copy link

pre commented Sep 27, 2024

We're seeing an issue where argocd-redis-ha-haproxy gets OOMKilled when all of the redis proxies are unreachable.

I verified, and haproxy consumes all available memory in a matter of seconds and then OOM Kill happens.

Limiting with maxconn 4096 fixes the OOM kill and allows haproxy remain in the failure loop peacefully.

This PR is only about Helm. The maxconn setting should be in all of the default manifests including https://raw.githubusercontent.com/argoproj/argo-cd/${VERSION}/manifests/ha/install.yaml

@@ -701,6 +701,10 @@ data:
stats enable
stats uri /stats
stats refresh 10s
# Additional configuration
global
maxconn 4096
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the value already being used by some code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding here is this value used to be based on the max file descriptors the kernel would give you, in RHEL7 and 8 I believe this used to be 4096 (hence me using this value) in RHEL9 and i believe some other distro's this limit is removed and ha-proxy just tries to use all the file descriptors on the box until it runs out of memory now unless you set maxconn in it's config.

I don't see any harm in reducing the number but this was just chosen to be the absolute worst-case situation to maintain functionality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready for final review
Development

Successfully merging this pull request may close these issues.

HA install's argocd-redis-ha-haproxy pods have runaway memory consumption
5 participants