Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nginx "502 Bad Gateway" #130

Open
gothub opened this issue May 29, 2018 · 2 comments
Open

Nginx "502 Bad Gateway" #130

gothub opened this issue May 29, 2018 · 2 comments
Assignees
Labels
metadig All issues related to metadig

Comments

@gothub
Copy link
Contributor

gothub commented May 29, 2018

When many requests are send to k8s metadig, Nginx response with either:

504 Gateway Time-out or 502 Bad Gateway

The former message is received when 1000 requests are sent in less than 60 seconds.
The later message is more common when 1000 requests are sent, but a 1 second pause is use after every 10 request.

It appears that Nginx or the connection itself is becoming saturated.

BTW, the full test involves sending 10000 unique metadata documents to be quality scored.This test will be run as soon as these Gateway problems have been resolved.

@gothub gothub self-assigned this May 29, 2018
@gothub gothub added this to the 2.0.0 milestone Jun 6, 2018
@gothub
Copy link
Contributor Author

gothub commented Jun 12, 2018

Note that the "Gateway TIme-out" message is being received by the client (a Python script) that is sending requests to k8s. The NGINX instance (the k8s ingress) however, is printing msgs like these:

2018/06/12 18:03:18 [error] 1878#1878: *896 upstream timed out (110: Connection timed out) while connecting to upstream, client: 192.168.25.64, server: docker-ucsb-1.test.dataone.org, request: "POST /metadig-webapp/suites/knb.suite.1/run HTTP/1.1", upstream: "http://192.168.158.24:80/metadig-webapp/suites/knb.suite.1/run", host: "docker-ucsb-1.test.dataone.org:30080"

which looks like the Apache container is the bottleneck.

The current configuration is "k8s ingress (NGINX)" -> Apache2 -> Apache Tomcat

The Apache2 container isn't really required here, as the k8s NGINX ingress can provide SSL/TLS termination and routing, so the next thing to try is a direct connection between NGINX and Tomcat.

@gothub
Copy link
Contributor Author

gothub commented Jun 13, 2018

Update: the Apache2 container has been removed, so that NGINX is sending requests directly to the 8080 port of the container running Apache Tomcat. The metadig-engine controller is running in this container.

The '... upstream timed out' messages are still being printed by NGINX, however, all quality document requests are being processed and indexed into Solr.

I suspect that connections between NGINX and Tomcat are staying open, then timing out. It appears that the response from Tomcat isn't being received by NGINX.

@mbjones mbjones added the metadig All issues related to metadig label Jun 13, 2018
@gothub gothub modified the milestones: 2.0.0, 2.0.1 Sep 24, 2018
@gothub gothub modified the milestones: 2.0.1, 2.1.0 Jan 30, 2019
@gothub gothub modified the milestones: 2.1.0, 3.0 Apr 2, 2020
@jeanetteclark jeanetteclark removed this from the 3.0 milestone Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metadig All issues related to metadig
Projects
None yet
Development

No branches or pull requests

3 participants