You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
concourse-summary will occasionally overwhelm our Concourse server with network connections to its atc processes. This causes the Concourse server's web workers to run out of file handles and be unable to function correctly.
Investigation reveals that our concourse-summary instance has ~1k ESTABLISHED connections to our Concourse server and each of our two web instances have ~1k ESTABLISHED connections between the atc process running on the web instance and the concourse-summary instance. (Yes, this mismatch seems a little strange.)
As one would expect, the following procedure shuts down these ESTABLISHED connections and gets us back in working order:
Stop concourse-summary
Restart the atc service on each Concourse web instance
Wait fo the atc services to come back up
Start concourse-summary
Expected Behavior
concourse-summary should only have enough network connections open to get its job done. Given that there are less than 200 connections open when we restart concourse-summary, ~1k connections seems to be too many connections.
More Details
We have seen this issue happen twice in the past ~four months. We do not currently know if this is a gradual increase in the number of ESTABLISHED connections, or if this happens suddenly.
Our web instances are behind a GCP TCP Regional Load Balancer.
Our concourse-summary instance is providing a summary of both our Concourse server (version 4.2.1) and the Wings Concourse server (version 5.1.0). concourse-summary is deployed in a 2.4 PCF running on top of vSphere.
Unfortunately, we don't know what software (concourse-summary, Concourse, GCP Load Balancer) is at fault.
The text was updated successfully, but these errors were encountered:
Hi Kenneth, I'm very interested in this problem. I will play around with my own instances, but if you are able to start logging ESTABLISHED connections over time, it would be interesting to know if it is a slow burn or quickly problematic
We have some work in our backlog that will track this over time... we just need to get it prioritized. If this happens again (and we have the tracking in place), I'll make sure that we put details in this GH Issue.
@klakin-pivotal I suspect that this may be related to crystal-lang/crystal#8025. Unfortunately that has not yet landed on a release, but it would be interesting to check if the problem exists after the next release of crystal
Versions Involved
Problem Description
concourse-summary
will occasionally overwhelm our Concourse server with network connections to itsatc
processes. This causes the Concourse server'sweb
workers to run out of file handles and be unable to function correctly.Investigation reveals that our
concourse-summary
instance has ~1kESTABLISHED
connections to our Concourse server and each of our twoweb
instances have ~1kESTABLISHED
connections between theatc
process running on theweb
instance and theconcourse-summary
instance. (Yes, this mismatch seems a little strange.)As one would expect, the following procedure shuts down these
ESTABLISHED
connections and gets us back in working order:concourse-summary
atc
service on each Concourseweb
instanceatc
services to come back upconcourse-summary
Expected Behavior
concourse-summary
should only have enough network connections open to get its job done. Given that there are less than 200 connections open when we restartconcourse-summary
, ~1k connections seems to be too many connections.More Details
We have seen this issue happen twice in the past ~four months. We do not currently know if this is a gradual increase in the number of
ESTABLISHED
connections, or if this happens suddenly.Our
web
instances are behind a GCP TCP Regional Load Balancer.Our
concourse-summary
instance is providing a summary of both our Concourse server (version 4.2.1) and the Wings Concourse server (version 5.1.0).concourse-summary
is deployed in a 2.4 PCF running on top of vSphere.Unfortunately, we don't know what software (
concourse-summary
, Concourse, GCP Load Balancer) is at fault.The text was updated successfully, but these errors were encountered: