-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many open files #1317
Comments
Hm, FWIW, we've seen workloads up to 60K RPS per node on Could you just |
I seem to get |
Hm, odd. I think the lxc link is a red herring. I just did this:
This doesn't work for you? |
hmm, no it doesn't :/ edmundfung: ~/go/src/edge/visage
± |dev {24} U:7 ?:2 ✗| → kubectl exec -it k8-ambassador-859f984bbf-glvdf -- /bin/sh
/ambassador $ ulimit -n 90000
sh: error setting limit: Operation not permitted
/ambassador $ ulimit -Hn
8192
/ambassador $ ulimit -Sn
2048 Ok, it sounds like if I can get the ulimit up then Ambassador will probably work as expected? |
I think this is an issue with EKS: pires/kubernetes-elasticsearch-cluster#215. |
Ok, I can confirm that with the ulimit increase, Ambassador is working a lot better. Essentially for anyone else working with EKS and has trouble with ulimits, override the docker config json |
Describe the bug
I seem to reach the open file limit fairly quickly which causes pods to restart leading to downtime. I see
[2019-03-15 00:14:11.730][000182][critical][assert] [source/common/network/listener_impl.cc:82] panic: listener accept failure: Too many open files
in the logs and the container is restarted by Kubernetes. The load per Ambassador node is between ~1300 to ~1,800 request per second. Ulimit seems to default to 2048.
To Reproduce
Steps to reproduce the behavior:
/
to this service only. Our service test endpoints returns 200 with a random latency of 0 to 1 sec.Expected behavior
I didn't expect Ambassador pods to die/restart at that rps. I was hoping to see similar throughput to our test service. Is this around the expected load for an Ambassador pod?
Also the default ulimit seems pretty low. Is there a way to increase that? On our docker image, I was able to set the ulimit before starting the application
Versions (please complete the following information):
Additional context
I also ensured that the load balancing between Ambassador pods were equal.
Here is a graph of the tests from the perspective of the downstream service. Green lines is the total rps received on the service and the lower lines are load from each Ambassador pod.
The first green block is at 16,000 rps total and once I bumped it up to 18,000 traffic no longer reached the downstream service because Ambassador nodes were restarting.
The second green block is at 18,000 rps total with 30 Ambassador nodes
The last green block was a ramp up test which failed at around 40,000 total rps with the same 30 Ambassador nodes.
Ambassador pods have 3Gi mem and 2000m cpu allocations to make sure those resources aren't bottlenecks
The text was updated successfully, but these errors were encountered: