Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider upstreaming #111

Open
bprashanth opened this issue Dec 7, 2016 · 19 comments
Open

Consider upstreaming #111

bprashanth opened this issue Dec 7, 2016 · 19 comments

Comments

@bprashanth
Copy link

Since this is based on nginx, and we have an nginx backend (https://github.com/kubernetes/ingress/tree/master/controllers/nginx), and we're working on a generic controller with pluggable backends (https://github.com/kubernetes/ingress/blob/master/core/pkg/ingress/doc.go#L19), it seems like a good way to collaborate, for greater justice.

Even if you don't feel like it's worth your time right now, it might make sense to streamline how feed and the standard nginx backend handle nginx specific features.

@craigbox
Copy link

craigbox commented Dec 7, 2016

@balooo: Hello and thank you for your time today :-)

@jsravn
Copy link
Contributor

jsravn commented Dec 8, 2016

This sounds good to me. I'll talk to the team and see what they think. We are going to be prod live very soon so we'll have to stick with this project for at least the short term. We still have some work planned for feed that we need to do very soon (moving to client-go, using endpoints instead of services), after that we can look into switching to upstream.

@bprashanth
Copy link
Author

fyi the upstream interface should give you something like a snapshot of urls+services+endpoints from its local cache on each update notification from apiserver, so you can pick if you want to go straight to svc vip, svc dns, svc nodeport or endpoints

@jsravn
Copy link
Contributor

jsravn commented Dec 16, 2016

@bprashanth We're probably going to back away from adding endpoints to nginx. The problem is nginx reload is quite problematic when done frequently (such as would happen with endpoint updates):

  • When keepalive connections are involved, clients will get stale connections a lot - as nginx simply closes any connections it considers idle on a reload/graceful stop, without waiting for the keepalive_timeout.
  • Memory usage can balloon very quickly under load as old nginx workers will stick around after a reload.

So I think using endpoints with nginx is going to be a bad idea. The problem of persistent connections + endpoint updates needs to be solved at the kube-proxy layer ideally.

@bprashanth
Copy link
Author

I'd expect both sides to follow the tcp protocol? As long as nginx sends a fin and does the normal close handshake, the client should tear down the connection, keepalive or not, unless there's a stateful NAT inbetween that disagrees with tcp timeouts?

Maybe you can describe what you see on a tcpdump in a controlled environment (or is this too hard to hit in testing)?

@jsravn
Copy link
Contributor

jsravn commented Dec 16, 2016

We're using an ELB in front of nginx, and we'll see ELB 504s getting returned while under load when nginx does a reload. It isn't very many though, because ingress updates are pretty rare. As the ELB is a black box I can't see what it's doing.

I will try to duplicate this and capture logs to understand it better.

@jsravn
Copy link
Contributor

jsravn commented Dec 16, 2016

I think something else may be going wrong, so I'll investigate further. Could be an ELB bug. I think somewhere in the ELB docs it says the ELB needs to close the backend connections. Akamai has a similar restriction.

@jsravn
Copy link
Contributor

jsravn commented Dec 20, 2016

@bprashanth tcpdump shows the nginx host sending tons of tcp resets to the ELB when a reload occurs. These translate directly into 504s returned from the ELB. I can reproduce as well hitting nginx directly with wrk using http/1.1 keepalives, I get connection resets. It seems so far nginx reload isn't as graceful as one would hope. I would expect it to close its connections gracefully, but so far it seems nginx simply closes the socket when idle rather than wait for the connection to terminate on the ELB/client side.

@bprashanth
Copy link
Author

how are you reloading it? (http://nginx.org/en/docs/control.html)

@jsravn
Copy link
Contributor

jsravn commented Dec 20, 2016

We send a HUP to the master process. I'm pretty sure this is all expected, http 1.1 rfc 2616 says clients should be able to handle async closes (non synchronous / graceful). Otherwise nginx workers would have to wait for some grace period to be sure the connections close.

I spoke with AWS support and they suggested I use the new application load balancers instead. I tried one out, and indeed it works fine with nginx reloads - not a single 504. Apparently the ALBs don't reuse persistent connections, instead it looks like they batch multiple requests through a connection then close it. That seems to solve the problem for us - so with ALB I can start updating endpoints with confidence :).

@bprashanth
Copy link
Author

RST is still kind of rude, I get the feeling there's a more obvious fin/fin-wait-timeout option but I'd have to dig up nginx config a bit. Great that you have it working. Unfortunately Service.Type=LoadBalancer doesn't setup an ALB, so maybe this is how we should finally manifest Ingress on AWS?

Right now it's an ELB created by Service.Type=LB over the nginx ingress controller. Is setting up an ALB just like setting up an ELB + one last step to create something like a url map? or are all the resources in the pipeline completely different (as is the case on GCE)

@jsravn
Copy link
Contributor

jsravn commented Dec 20, 2016

ALB is very similar to an ELB it seems, I didn't have to set up any url mapping when I did it through the console. Unfortunately it is a different API so I'll have to modify feed to use it.

If you find a way to change nginx's behaviour please let me know, I spent a lot of time looking at config options and its source code, and it seems to do as I describe - closes socket immediately.

Apparently you can improve things by issuing a USR2 followed by WINCH, which will actually wait for connections to drain, but USR2 doesn't seem to work in containers.

@bprashanth
Copy link
Author

bprashanth commented Dec 20, 2016 via email

@jsravn
Copy link
Contributor

jsravn commented Dec 20, 2016

I think the ALB is layer7 only, so you couldn't do TCP load balancing with it.

I only tested with layer7 ELB, so for me ALB was a drop in replacement.

@bprashanth
Copy link
Author

bprashanth commented Dec 20, 2016 via email

@justinsb
Copy link

Ideally creating an Ingress on AWS should just work, no surprises, which apparently
isn't the case right now since people are shoving a Service of Type=LB/ELB in front.

What is the alternative?

@bprashanth
Copy link
Author

ingress controller that spins up an ALB, like the gce one? You can also tier them, so you have an ALB -> nginx ingress. or maybe we can find a way to setup a catch all ALB instead of an ELB via annotations on the Service (not sure how similar they are in setup)

@jsravn
Copy link
Contributor

jsravn commented May 5, 2017

Just an update:

  1. We've pretty much given on nginx as an ingress proxy. It's very difficult to get seamless reloads in nginx - it will frequently drop requests in proportion to how often you reload. After much research and spiking, we've come to the conclusion haproxy is the superior, and only, viable solution for use as k8s ingress. We can reload many times a second without any dropped requests, and no noticeable performance problems - unlike nginx. With https://www.haproxy.com/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/ we don't even need to hack around the small issues in haproxy reload.
  2. ALBs are also out, because we discovered that ALBs don't support connection draining. That means there is no way to have a 0 downtime redeployment of the ingress component. It's a huge deficit in ALBs and we had to work with AWS before we got to the bottom of the problem. It's not documented in the AWS docs - they have told us they raised an internal ticket to update the docs. In the meantime, we will probably go back to ELBs. ELB+haproxy should give us truly 0% downtime in the face of even heavy ingress updates.

@jsravn
Copy link
Contributor

jsravn commented Jul 6, 2017

Update #2 - as ingress doesn't reload very often, and the percentage of dropped requests is quite low, we haven't prioritised haproxy work. Otherwise the nginx proxy works great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants