-
-
Notifications
You must be signed in to change notification settings - Fork 199
Add SpanProcessor for OpenTelemetry #875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
0ebd132
to
dee9540
Compare
@whatyouhide ☝🏻 I'm having trouble figuring out these failures. Could you tell me how I could debug what exactly causes the :sentry app startup failure? We start it manually in the umbrella test helpers and for whatever reason it started failing when opentelemetry is included. |
Adding just the OTel deps without any other changes leads to the same issue? |
please write some minimal description in the PR about
|
@whatyouhide @sl0thentr0py I narrowed it down to |
@whatyouhide for the time being I addressed it by using optional deps for otel packages via 6fdf121 but then one of the tests in event_test.exs started to fail so I fixed it via d04c5d3 even though I don't understand what's going on there 🙃 |
@solnic is this ready for review? It's still a draft |
Not yet. I got it working but phoenix + bandit spans are not processed in a way that would make sense for Sentry for some reason. I've been investigating how to fix. It seems like phoenix spans are not coming in as children of bandit spans so there's a disconnect here. I'll figure it out 🙂 |
If we're just shipping a We will revisit packaging later once we have a proper working prototype. |
b607ac1
to
bdcc5de
Compare
@sl0thentr0py @whatyouhide this is now open for reviews. I got it deployed to production already and it's working well (see screenshots in the description). |
okay I'm taking a preliminary look now one thing we definitely need is that instead of the boolean
without these, it is very hard for people to control quota spend so this is a hard requirement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some suggestions, looks very good otherwise!
alias Sentry.Interfaces.Span | ||
|
||
@impl true | ||
def on_start(_ctx, otel_span, _config) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you've implemented it, we can do nothing in on_start
since the on_end
logic is sufficient to build the span tree and transaction.
This will also simplify your SpanStorage.update_span
where you can remove the unnecessary deletes.
(This is also how we've done it in the new python and JS span processors.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sl0thentr0py I'll dig into this more but when I make on_start a no-op the tests are no longer passing 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@solnic did you dig into it more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide yes but not until now I managed to figure out how to do it to make this work, I just pushed 9a5c550 that simplified things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking like a good start but still a bunch of work left to do. Let me know if any of the comments are not clear!
c9152c6
to
1654724
Compare
@sl0thentr0py I'll add traces_sampler config in a separate PR |
01585a8
to
5e8fc99
Compare
@whatyouhide @sl0thentr0py ready for another round of reviews 🤞🏻 |
I'll review tomorrow. |
@@ -0,0 +1,31 @@ | |||
if Code.ensure_loaded?(:otel_sampler) do | |||
defmodule Sentry.OpenTelemetry.Sampler do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on why we need a Sentry-specific sampler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide in order to provide a standard interface for traces_sampler
option we need a custom abstraction for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this with links to docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide Sentry has a standard way of handling trace sampling via traces_sampler
configuration option which is meant to be a function that receives a sampling context. In order to provide support for this setting we just need our own Sampler that would call configured tracer sampler with a context that it would be built from information that it receives in the should_sample callback.
Here are docs about this feature in the Ruby SDK as a reference https://docs.sentry.io/platforms/ruby/configuration/sampling/#sampling-transaction-events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the Sampler to be my application-specific concern; this module seems unnecessary to me
require OpenTelemetry | ||
|
||
@fields Record.extract(:span, from_lib: "opentelemetry/include/otel_span.hrl") | ||
Record.defrecordp(:span, @fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this conversation is resolved?
This feels so close! How far away do we think it is from release? |
@jwaldrip yes just waiting for @whatyouhide for the final review :) then I'll follow up with another PR with support for sampler (initial work is already done) and we should be ready to release it 🙂 |
@solnic I am pulling the branch directly and testing with our self hosted instance of sentry. Is there anything we need to do in our API configuration to have these show up in sentry under performance? |
@jwaldrip I'm sorry but I'm not familiar with self hosted instances. I need to confirm if anything extra is needed. For now please try to replicate setup that I added to our test_integrations/phoenix_app test app 🙂 Thanks for helping with testing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really ready for review, there are quite a few comments that have been resolved without being addressed or comments that have not been addressed in general.
This is a big, complex PR, so I would in general suggest splitting it up in "less controversial" pieces. For example, adding a :traces_sample
configuration is very straightforward, and if it were its own PR, then it could be a piece we merge and not think about. Same goes for adding tests to the Phoenix app, for example.
In general, for this kind of workflow I suggest something like stacked PRs if you're familiar with the concept.
alias Sentry.Interfaces.Span | ||
|
||
@impl true | ||
def on_start(_ctx, otel_span, _config) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@solnic did you dig into it more?
defp get_op_description(%{ | ||
attributes: | ||
%{unquote(to_string(MessagingAttributes.messaging_system())) => :oban} = attributes | ||
}) do | ||
{"queue.process", attributes["oban.job.worker"]} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the Oban/OTel integration be the one responsible for getting this right here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide queue.process
is a Sentry concept, that's why we have to handle it ourselves here. From Sentry docs:
queue.process
Based on the knowledge sources, queue.process is a span operation used in Sentry's tracing instrumentation for queue consumers. This operation is used to track when messages are delivered to or processed by a consumer in a messaging system.
Consumer Instrumentation
To implement queue consumer instrumentation with Sentry, you need to:
Create a span with the operation queue.process
Include specific attributes to enrich your consumer spans with queue
metrics
@@ -0,0 +1,31 @@ | |||
if Code.ensure_loaded?(:otel_sampler) do | |||
defmodule Sentry.OpenTelemetry.Sampler do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this with links to docs?
end | ||
|
||
defp setup_span_storage(opts) do | ||
uid = :rand.uniform(1000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then :rand.uniform/1
won't guarantee that in any way. Use System.unique_integer([:positive])
instead.
Oops, seems like I've got a commit from April 4th that I forgot to push. I used GH PR ext in VSCode to mark things as done but then my branch wasn't synced. Dooh! Sorry about that. The docs were updated according to your requests via f883094
Yes I'm familiar with this approach. Should I start doing it here? |
Previous versions were no longer supported by actions that we rely on.
7b00dd9
to
0559b9e
Compare
@solnic are these the final parts that need to be reviewed by @whatyouhide? Hoping we can see this released this month. 🥺 |
To make this work you need to add opentelemetry deps:
Then configure our span processor:
Things should start looking more or less like this: