You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We see 502 errors when an apollo-server pod is gracefully terminated in k8s. We host apollo-server in EKS and expose it using AWS ALBs.
AWS provides a troubleshooting guide for 502s and we can see our issue falls under the criteria for "The load balancer received a TCP RST from the target when attempting to establish a connection" (see screenshot below).
We attempted to increase the stopGracePeriodMillis in ApolloServerPluginDrainHttpServer to be higher than the kubernetes terminationGracePeriodSeconds and the ALB Target Group deregistration_delay but did not see a change in behavior.
We also have set httpServer.keepAliveTimeout and httpServer.headersTimeout higher than our ALB Session Timeout.
That is the default behavior (10s vs 60s in K8s). Our theory is that although K8s says it sends the sigterm and stops traffic at the same time they are not simultaneous (i.e. AWS does not tell the TargetGroup not to send traffic until after the sigterm is issued). And then any sessions using keepAlive
We added a preStop lifecycle hook in K8s that runs sleep 30 like this in the container spec and we haven't encountered the issue again.
Issue Description
We see 502 errors when an apollo-server pod is gracefully terminated in k8s. We host apollo-server in EKS and expose it using AWS ALBs.
AWS provides a troubleshooting guide for 502s and we can see our issue falls under the criteria for "The load balancer received a TCP RST from the target when attempting to establish a connection" (see screenshot below).
We attempted to increase the
stopGracePeriodMillis
inApolloServerPluginDrainHttpServer
to be higher than the kubernetesterminationGracePeriodSeconds
and the ALB Target Groupderegistration_delay
but did not see a change in behavior.We also have set
httpServer.keepAliveTimeout
andhttpServer.headersTimeout
higher than our ALB Session Timeout.Link to Reproduction
https://repost.aws/knowledge-center/elb-alb-troubleshoot-502-errors
Reproduction Steps
The text was updated successfully, but these errors were encountered: