fix: HA proxy memory runaway on certain rpm based distro's -> Setting maxconn in haproxy config (#15319) #18283

timgriffiths · 2024-05-20T01:22:57Z

I propose the best way to fix this issue is by setting a global setting in haproxy config maxconn 4000 you can also fix it by changing the max open file limit in containerd but as this comment points out docker-library/haproxy#194 (comment) this only works as haproxy derives the max number of connections from the max open files on a system which seems like a bit of a bug or at least we should set a max as part of the config.

Setting the default sufficiently large so that this should not be a problem. this could be backported to whichever versions users need as it's a simple haproxy config tweak

Checklist:

codecov · 2024-05-27T12:53:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.07%. Comparing base (3160369) to head (6c35fb1).
Report is 864 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #18283      +/-   ##
==========================================
+ Coverage   45.06%   45.07%   +0.01%     
==========================================
  Files         354      354              
  Lines       47963    47969       +6     
==========================================
+ Hits        21614    21624      +10     
+ Misses      23541    23538       -3     
+ Partials     2808     2807       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jennerm · 2024-06-05T16:48:10Z

Hi @timgriffiths
Thanks so much for creating this fix. I am using the Helm chart to install Argo CD and I have been struggling with the issue of the ha-proxy pods getting OOMKilled on start up. I have successfully used your fix in the helm chart but I had to modify it slightly:

redis-ha:
  haproxy:
    extraConfig: |
      global
        maxconn 4096

I had to remove the colons from your original, otherwise the pod failed with an error:

Defaulted container "haproxy" out of: haproxy, config-init (init)
[NOTICE]   (1) : haproxy version is 2.9.4-4e071ad
[ALERT]    (1) : config : parsing [/usr/local/etc/haproxy/haproxy.cfg:85] : unknown keyword 'global:' in 'frontend' section
[ALERT]    (1) : config : parsing [/usr/local/etc/haproxy/haproxy.cfg:86] : unknown keyword 'maxconn:' in 'frontend' section; did you mean 'maxconn' maybe ?
[ALERT]    (1) : config : Error(s) found in configuration file : /usr/local/etc/haproxy/haproxy.cfg
[ALERT]    (1) : config : Fatal errors found in configuration.

Seems like you would need this change in your fix too?

Signed-off-by: Timothy Griffiths <[email protected]>

timgriffiths · 2024-06-12T04:17:55Z

Thank you. @jennerm clearly had yaml on the brain when I copied that fix from my environment. It's fixed now.

Dawnflash · 2024-06-12T13:10:24Z

I can confirm this fixes the issue for us on AL2023 EKS nodes.

pre · 2024-09-27T13:59:18Z

We're seeing an issue where argocd-redis-ha-haproxy gets OOMKilled when all of the redis proxies are unreachable.

I verified, and haproxy consumes all available memory in a matter of seconds and then OOM Kill happens.

Limiting with maxconn 4096 fixes the OOM kill and allows haproxy remain in the failure loop peacefully.

This PR is only about Helm. The maxconn setting should be in all of the default manifests including https://raw.githubusercontent.com/argoproj/argo-cd/${VERSION}/manifests/ha/install.yaml

andrii-korotkov-verkada · 2024-11-09T14:39:59Z

manifests/ha/base/redis-ha/chart/upstream.yaml

@@ -701,6 +701,10 @@ data:
      stats enable
      stats uri /stats
      stats refresh 10s
+    # Additional configuration
+    global
+      maxconn 4096


Is the value already being used by some code?

My understanding here is this value used to be based on the max file descriptors the kernel would give you, in RHEL7 and 8 I believe this used to be 4096 (hence me using this value) in RHEL9 and i believe some other distro's this limit is removed and ha-proxy just tries to use all the file descriptors on the box until it runs out of memory now unless you set maxconn in it's config.

I don't see any harm in reducing the number but this was just chosen to be the absolute worst-case situation to maintain functionality

timgriffiths requested a review from a team as a code owner May 20, 2024 01:22

timgriffiths force-pushed the fix-#15319 branch from 23e1d5a to c854e62 Compare May 27, 2024 12:52

timgriffiths force-pushed the fix-#15319 branch from c854e62 to b52d1d6 Compare May 27, 2024 13:05

timgriffiths requested review from a team as code owners June 11, 2024 06:50

timgriffiths closed this Jun 12, 2024

timgriffiths force-pushed the fix-#15319 branch from a3524b2 to 3e2cfb1 Compare June 12, 2024 04:12

Fixing redis extra values

6c35fb1

Signed-off-by: Timothy Griffiths <[email protected]>

timgriffiths reopened this Jun 12, 2024

pre mentioned this pull request Sep 27, 2024

HA install's argocd-redis-ha-haproxy pods have runaway memory consumption #15319

Open

andrii-korotkov-verkada reviewed Nov 9, 2024

View reviewed changes

andrii-korotkov-verkada approved these changes Nov 15, 2024

View reviewed changes

andrii-korotkov-verkada added the ready-for-review label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: HA proxy memory runaway on certain rpm based distro's -> Setting maxconn in haproxy config (#15319) #18283

fix: HA proxy memory runaway on certain rpm based distro's -> Setting maxconn in haproxy config (#15319) #18283

timgriffiths commented May 20, 2024

codecov bot commented May 27, 2024 •

edited

Loading

jennerm commented Jun 5, 2024

timgriffiths commented Jun 12, 2024

Dawnflash commented Jun 12, 2024

pre commented Sep 27, 2024

andrii-korotkov-verkada Nov 9, 2024

timgriffiths Nov 12, 2024

fix: HA proxy memory runaway on certain rpm based distro's -> Setting maxconn in haproxy config (#15319) #18283

Are you sure you want to change the base?

fix: HA proxy memory runaway on certain rpm based distro's -> Setting maxconn in haproxy config (#15319) #18283

Conversation

timgriffiths commented May 20, 2024

codecov bot commented May 27, 2024 • edited Loading

Codecov Report

jennerm commented Jun 5, 2024

timgriffiths commented Jun 12, 2024

Dawnflash commented Jun 12, 2024

pre commented Sep 27, 2024

andrii-korotkov-verkada Nov 9, 2024

Choose a reason for hiding this comment

timgriffiths Nov 12, 2024

Choose a reason for hiding this comment

codecov bot commented May 27, 2024 •

edited

Loading