Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sentry-tracing): switch sentry spans on enter and exit #686

Merged
merged 3 commits into from
Nov 29, 2024

Conversation

saiintbrisson
Copy link
Contributor

@saiintbrisson saiintbrisson commented Sep 25, 2024

I've been facing a bug with Sentry Trace where concurrent spans end up wrongly nested. At first, I thought this was some strange autogrouping bug and event sent a feedback (sorry Sentry support member 😅).

After digging a little bit, I discovered that the current tracing layer implementation sets the sentry span as active as soon as a tracing span is created, which should not be the case, as spans are only active once entered. Following the same logic, it also reverts to the parent span only when a span is closed, and not when it's exited.

A span can be entered and exited multiple times, for example, when a tracing Instrumented future polls and returns Pending. In case of concurrent Instrumented futures, the current impl ends up nesting Futures under the first created one.

sentry-tracing layer (wrong)

Screenshot 2024-09-25 at 14 22 57

tracing-opentelemetry layer (correct)

Screenshot 2024-09-25 at 14 28 11

I'd say that the best idea would be to not use sentry_core::configure_scope(|s| s.get_span()); when creating a new span to get the parent, and get it from the span extensions instead, but the current behavior might be relied upon already by someone.

@saiintbrisson saiintbrisson force-pushed the master branch 3 times, most recently from 7a4b72c to a138e3c Compare September 27, 2024 01:26
@saiintbrisson saiintbrisson marked this pull request as draft September 27, 2024 10:04
@saiintbrisson saiintbrisson force-pushed the master branch 3 times, most recently from 7a4b72c to 6f19f1d Compare September 27, 2024 10:14
@saiintbrisson saiintbrisson marked this pull request as ready for review September 27, 2024 10:14
@szokeasaurusrex szokeasaurusrex self-requested a review September 30, 2024 08:30
@Swatinem
Copy link
Member

In case of concurrent Instrumented futures, the current impl ends up nesting Futures under the first created one.

For those cases, one should usually fork the Sentry Hub in order to have them fully isolated.

While you are right that enter/exit is a better fit, and when someone is manually hooking up spans, this distinction can lead to some confusion.
However, Hub isolation is the proper solution to this. Hooking into enter/exit in order to set/unset the active Span in the Sentry scope would not fully fix the problem of concurrently running futures.

@szokeasaurusrex
Copy link
Member

Hi @saiintbrisson, are you able to solve this problem with hub isolation?

I would honestly be a bit hesitant to change how we enter and exit Sentry spans, since others might be relying on the current behavior. But, we can look into it if you are unable to find a way to work around this problem.

@saiintbrisson
Copy link
Contributor Author

saiintbrisson commented Oct 2, 2024

Hi @saiintbrisson, are you able to solve this problem with hub isolation?

I would honestly be a bit hesitant to change how we enter and exit Sentry spans, since others might be relying on the current behavior. But, we can look into it if you are unable to find a way to work around this problem.

I ended up just using my fork at my company. Hub isolation is pretty verbose as I would have to bind it to every future as it currently stand. I already isolate it for my root future (I have a Axum middleware binding a hub to the handler call).

The problem arises on child futures, like join!ing two function futures. If both functions are tracing::instrumented, the first one will create a span when first polled, and when the second one is polled, its span will be considered a child to the first function, because the first span will only end after it returns Poll::Ready, which seems objectively wrong to me, as shown in the prints. That is, tracing::Span::current is not the same one as sentry::configure_scope(|s| s.get_span()), and I find this rather confusing.

Probably manually binding a hub to every call inside my join would work, but I don't want to make my coworkers handle this kind of detail during development. We have a couple of hundreds of mentions of try_join! on our code, and general code is agnostic to our telemetry stack (everything is handled through tracing).

If this happens to be a wont-fix, I'll probably just create my own tracing layer with this impl, but I do think sentry-tracing would benefit from this fix.

@Swatinem
Copy link
Member

Swatinem commented Oct 2, 2024

Hub isolation is pretty verbose as I would have to bind it to every future as it currently stand.

Yes that is a bit unfortunate, but ultimately the correct thing to do. Apart from proper span hierarchy, this also relates to Sentrys Scope and other concepts, which could go out of sync if multiple concurrent futures are messing with them.

@saiintbrisson
Copy link
Contributor Author

saiintbrisson commented Oct 2, 2024

Hub isolation is pretty verbose as I would have to bind it to every future as it currently stand.

Yes that is a bit unfortunate, but ultimately the correct thing to do. Apart from proper span hierarchy, this also relates to Sentrys Scope and other concepts, which could go out of sync if multiple concurrent futures are messing with them.

I don't think I understand how Scopes are to be used. I currently create a new Hub, create a transaction and set it to the created Hub, that I than use to bind it to the handler call (one scope per request). The way you described it, it seems like every future should have its own Scope?

In any case, I expected sentry-tracing to behave like what tracing-subscriber defines, thus my confusion as someone familiar with other impls (like tracing-opentelemetry). Because I use the tracing layer, I was wishing I could forget sentry Scopes altogether in favor of tracing spans/events/instrumentation. The current behavior should probably be described in the README for the crate: https://github.com/getsentry/sentry-rust/tree/master/sentry-tracing#tracing-spans, it's not clear that running these futures concurrently will provoke unexpected (faulty?) results.

Let me know if you folks come to any conclusion about this PR. For now, I'll create my own layer impl and move the fn get_span_idto #687.

Thanks for your time and keep up the good work, we've been enjoying Sentry a lot around here :).

@Swatinem
Copy link
Member

Swatinem commented Oct 3, 2024

All in all, thank you for the great feedback, this is very valuable.

I don't think I understand how Scopes are to be used. I currently create a new Hub, create a transaction and set it to the created Hub, that I than use to bind it to the handler call (one scope per request). The way you described it, it seems like every future should have its own Scope?

I might have added a bit to the confusion here. The Rust SDK still has a couple of concepts that have since been removed from the wider "Sentry unified API". For example, there are no Hubs anymore, and no Transactions either.
There is just Scopes and Spans. Though the Rust SDK is unfortunately quite behind on all of this :-(

Anyhow, in essence you want every future to have its own Scope (which in the Rust SDK is still named Hub) as you want to have isolation between them, for example:

  • When you set tags like file_id, you want those to be unique to each concurrently running future, as any error being logged should have the correct file_id.
  • Similarly, log messages which are being converted to breadcrumbs should also be isolated between concurrent futures. Otherwise a log line like "Starting to process file X" would be intermingled with other breadcrumbs unrelated to the operation.

The #[tracing] macro conveniently wraps the resulting future in Instrumented as you noted, which makes sure to push/pop the currently active span to a thread local. Sentry is completely outside of that, so you would have to do the equivalent for the Sentry Hub manually. Which then does the same management of a thread-local with appropriate state.

As I mentioned, the Rust SDK is quite a bit behind in terms of features. By now, the Sentry JS SDK has fully embraced opentelemetry, the SDK and instrumentation side that is. I could maybe imagine Sentry doing something similar in the sense of not duplicating all this span tracking, etc, but fully embracing tracing and build on top of that. Though that would be a ton of work. :-D

@nrxus
Copy link
Contributor

nrxus commented Oct 3, 2024

I think this is useful even outside of Futures.

My application doesn't do any async but whenever we create a new thread we also create a new tracing span for that thread. The span is created before the new thread (to inherit the current span as the parent) but then only enter it inside the new thread. If I am understanding this issue correctly, currently any events between the creation of the span and the entering of the span will have the wrong span information. While I understand this is a change in behavior I think the current behavior is fairly unexpected and not intuitive.

@DoumanAsh
Copy link
Contributor

DoumanAsh commented Oct 30, 2024

I just noticed that my PR #700 does the same thing

So I might as well close it, but I would like to point few obvious things with the way sentry does things.
Sentry hubs are stored in static variables so that's actual root of the issue for tracing implementation

Moving scope change to enter/exit will solve majority of issues aside from multi threaded environments.
To work it around you can also just pin tasks to threads (work stealing runtimes were a mistake) or use bind_hub https://docs.rs/sentry/0.34.0/sentry/trait.SentryFutureExt.html#method.bind_hub

But you cannot make tracing span works without changing scope setting to the on enter/exit

@saiintbrisson
Copy link
Contributor Author

I just noticed that my PR #700 does the same thing

So I might as well close it, but I would like to point few obvious things with the way sentry does things. Sentry hubs are stored in static variables so that's actual root of the issue for tracing implementation

Moving scope change to enter/exit will solve majority of issues aside from multi threaded environments. To work it around you can also just pin tasks to threads (work stealing runtimes were a mistake) or use bind_hub docs.rs/sentry/0.34.0/sentry/trait.SentryFutureExt.html#method.bind_hub

But you cannot make tracing span works without changing scope setting to the on enter/exit

I'm with you. Though this might not fix everything, it surely fixes most of it. I've been using my fork at my employer and haven't really had any issues up to now.

@DoumanAsh
Copy link
Contributor

DoumanAsh commented Oct 30, 2024

I looked over code again and how bind_hub works
And I think tracing layer could use the same Switch guard https://github.com/getsentry/sentry-rust/blob/master/sentry-core/src/hub_impl.rs#L24

Hub could be stored together with span and on every enter, you could create guard, and then drop guard on every exit

This should ensure tracing span will retain the same hub across any thread, if I understood its purpose properly

TL;DR
Extend SentrySpanData as follows:

pub(super) struct SentrySpanData {
    pub(super) sentry_span: TransactionOrSpan,
    parent_sentry_span: Option<TransactionOrSpan>,
    creation_hub: Arc<sentry_core::Hub>,
    //Set to Some() on enter, set to None in on_exit
    switch_guard: Option<sentry_core::hub_impl::SwitchGuard>
}

@DoumanAsh
Copy link
Contributor

@saiintbrisson Can you maybe rebase your work on current master? I'm thinking to use patched version of sentry and maybe will make PR against yours to add Switch hub guard

@DoumanAsh
Copy link
Contributor

DoumanAsh commented Oct 31, 2024

I created prototype of my idea:
saiintbrisson#1

It will capture Hub on span creation and create switch guard on every enter of the span while dropping guard in exit

@Swatinem I believe it should solve all possible concerns with this PR

Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 79.41176% with 7 lines in your changes missing coverage. Please review.

Project coverage is 72.82%. Comparing base (605d36d) to head (8f139e3).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #686      +/-   ##
==========================================
+ Coverage   72.79%   72.82%   +0.02%     
==========================================
  Files          66       66              
  Lines        7804     7830      +26     
==========================================
+ Hits         5681     5702      +21     
- Misses       2123     2128       +5     

Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another look at this.

It looks good as is, but feel free to also fix the nits that I added before merging.

sentry-tracing/src/layer.rs Show resolved Hide resolved
sentry-core/src/hub_impl.rs Outdated Show resolved Hide resolved
@DoumanAsh
Copy link
Contributor

@saiintbrisson Can you apply comments on your PR?

@Swatinem
Copy link
Member

CI checks are also still unhappy :-(

@DoumanAsh
Copy link
Contributor

DoumanAsh commented Nov 28, 2024

use of deprecated type alias std::panic::PanicInfo: use PanicHookInfo instead

This is due to Rust team introducing unnecessary lints with each version

Do you want to silence it?
Using PanicHookInfo requires 1.81

Nevermind, we need to rebase this branch on master 9b82c90

@saiintbrisson
Copy link
Contributor Author

Hi! Yeah, I'll push fixes in a few hours, thanks for the patience 😅

@saiintbrisson saiintbrisson force-pushed the master branch 3 times, most recently from 8f139e3 to bff4693 Compare November 28, 2024 15:37
@saiintbrisson
Copy link
Contributor Author

@Swatinem I think this is good to go now. Thanks again for your time and the team for the amazing product.

And thanks @DoumanAsh for the support and getting it to the final solution.

@Swatinem Swatinem enabled auto-merge (squash) November 29, 2024 08:02
@Swatinem
Copy link
Member

Thank you for your patience and persistence here :-)

@Swatinem Swatinem merged commit 180a6b1 into getsentry:master Nov 29, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants