Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outbound HTTP endpoints seems to miscount "ready" endpoints with circuit breaking #12961

Open
kflynn opened this issue Aug 14, 2024 · 3 comments
Labels

Comments

@kflynn
Copy link
Member

kflynn commented Aug 14, 2024

What is the issue?

I was trying to set up a Grafana dashboard to show circuit breaking behavior with the Faces demo: the Faces GUI calls through Emissary to the face workload at the entry point of this demo. I intentionally break the world by adding a face2 Deployment which always fails, and setting it up so that the face Service spans Pods created by both the face and face2 Deployments.

At this point, you can do PromQL queries and see

2024-08-14T18:29:26.957000: emissary.emissary -> face.faces (pending): 0
2024-08-14T18:29:26.957000: emissary.emissary -> face.faces (ready): 2

This is correct: both endpoints are active, circuit breaking isn't involved, and one would expect that when circuit breaking is turned on, then the breaker opening would result in 1 pending and 1 ready. Unfortunately, in the event you actually get

2024-08-14T18:29:36.993000: emissary.emissary -> face.faces (pending): 1
2024-08-14T18:29:36.993000: emissary.emissary -> face.faces (ready): 3

which is a bit surprising! Then, when the breaker is turned off, you get

2024-08-14T18:30:37.132000: emissary.emissary -> face.faces (pending): 0
2024-08-14T18:30:37.132000: emissary.emissary -> face.faces (ready): 4

So pending seems to work fine, but the ready endpoints seem to be miscounted.

How can it be reproduced?

Enable circuit breaking and force the breaker to open. Watch pending and ready endpoints as you go.

Logs, error output, etc

See above. 🙂

output of linkerd check -o short

:; linkerd check -o short
Status check results are √

Environment

I'm using a kind cluster at the moment, K8s 1.30.3, Linkerd version edge-24.8.2.

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

@kflynn kflynn added the bug label Aug 14, 2024
@kflynn
Copy link
Member Author

kflynn commented Aug 15, 2024

Whoops, I should've added that those lines of output are from running this PromQL query

outbound_http_balancer_endpoints{deployment="emissary", namespace="emissary", backend_name="face", backend_namespace="faces"}

and then formatting the values coming back with each endpoint_state, but of course it shows up in Grafana or whatever as well.

@adleong
Copy link
Member

adleong commented Aug 22, 2024

This looks like it might be similar to linkerd/linkerd2-proxy#2928

@olix0r
Copy link
Member

olix0r commented Sep 10, 2024

Are you able to provide the output of linkerd diagnostics proxy-metrics and kubectl logs against a client in this state? This should help shine light on the nature of the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants