Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@opentelemetry/instrumentation-express handler does not run in correct context #2022

Open
aabmass opened this issue Mar 18, 2024 · 5 comments · May be fixed by #2638
Open

@opentelemetry/instrumentation-express handler does not run in correct context #2022

aabmass opened this issue Mar 18, 2024 · 5 comments · May be fixed by #2638
Labels
bug Something isn't working pkg:instrumentation-express priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect

Comments

@aabmass
Copy link
Member

aabmass commented Mar 18, 2024

What version of OpenTelemetry are you using?

├── @opentelemetry/[email protected]
└─┬ @opentelemetry/[email protected]
  ├── @opentelemetry/[email protected] deduped
  └── @opentelemetry/[email protected]

What version of Node are you using?

v18.19.1

What did you do?

Run the included express sample https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/examples/express

What did you expect to see?

Child spans of the handler should be children under the request handler span

What did you see instead?

Child spans created under the request handler fall under the top level server span (GET /run_test), which I believe is created by the @opentelemetry/instrumentation-http. I.e. they appear as siblings of the server span, next to middleware. This is really confusing because you can't tell if the span belongs to a middleware or the handler.

This screenshot demonstrates the problems
image

Additional context

It looks like the wrapped handler is not executed with the created span set in context:

Maybe this is intentional for 3-arg handlers that accept a next() function, as the next layer would be running in the parent's span context. If that is considered incorrect, you could probably solve this by wrapping the next() function to restore the parent context before executing the next layer to keep everything as siblings.

@aabmass aabmass added the bug Something isn't working label Mar 18, 2024
@pichlermarc pichlermarc added the priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect label Mar 20, 2024
@JamieDanielson
Copy link
Member

If I am understanding this issue correctly, I believe this was discussed on #1657 and we were torn on how best to approach a change here. As mentioned in this comment, the result of changing the nesting of spans significantly changes the visualization in a trace, which caused some concern. As traces get longer and spans become more nested, it becomes nearly impossible to view those spans meaningfully. For example, here is what the example app looked like in Zipkin after making a change.

zipkin current and proposed behavior

This is worth further discussion, though, so I appreciate the issue being raised. Please also let me know if I am misunderstanding the issue described here.

@aabmass
Copy link
Member Author

aabmass commented Mar 26, 2024

@JamieDanielson thanks for the context. I think the naive change would create the nested "proposed" result. I agree it's not ideal and not necessarily even more correct since we should be able to flatten out the callback hell.

Maybe we could hack around this by explicitly setting the parent span to the one created by @opentelemetry/instrumentation-http to break the nesting.

Do you know if/how we already solve this for other callback (non-promise) APIs like many of node's internal APIs that we instrument? I know we have de-coupled OTel context from the AsyncLocalStorage impl but it is used in Node in practice–I'm not an expert here but I think AsyncResources is supposed to let async frameworks manage context correctly (see here). This old issue may be relevant expressjs/express#4396 too.

@raphael-theriault-swi
Copy link
Contributor

Going to use this issue to collect some thoughts and throw some ideas around wrt how the express instrumentation currently handles span hierarchy and context propagation. I'll probably be at the SIG tomorrow if others want to maybe have a discussion about this and figure out the best path forward.

Request handlers

Request handlers are currently not instrumented properly and always produce bogus spans of little value. I don't know what the historical reason for that is but the current status is pretty awful. They also don't report errors at all at the moment.

I think in the case of request handlers we definitely want to propagate the context. It would be a big change in how traces are displayed but it wouldn't dramatically increase nesting if it stays a direct child span of the root express server span, and overall I really don't think anyone would find this makes anything less clear.

Middleware

They're properly instrumented for the most part, but also don't propagate context. I'd argue we'd also want them to do that, so that for instance an authentication middleware that queries a database actually creates that query span as a child of the middleware span. Since we agreed nesting middleware renders things unreadable at scale, I like the proposed solution of patching next to restore the parent context. We'd mostly keep the current way traces are displayed with a readability boost for complex middleware that call into other stuff.

Router spans

Similarly to route handlers spans these are bogus. Do we want the route handler to be a child of the router or keep things as is ? Don't really have an opinion here apart from "we should actually measure their real duration at least".

@aabmass
Copy link
Member Author

aabmass commented Mar 27, 2024

I see this was in the SIG agenda but I had a conflict. Was there any resolution?

@raphael-theriault-swi
Copy link
Contributor

We agreed that context should probably be propagated for both but need to make sure it actually wouldn't break existing uses. I'll probably implement the proposed changes to have a way to compare both in practice too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:instrumentation-express priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect
Projects
None yet
4 participants