Proposal: OTel SDK should expose a metric to inform about sampling decisions made #5756
Labels
enhancement
New feature or request
needs-spec-change
Issues which require the OpenTelemetry Specification to clarify or define behavior
pkg:OpenTelemetry
Issues related to OpenTelemetry NuGet package
Milestone
Package
OpenTelemetry
Is your feature request related to a problem?
Imagine the scenario that you have added a sampler to your Trace configuration, something like:
This will result in sampling decisions being made based on whether the incoming request has a trace header - if so it will honor that header, and if not it will sample a max of 3 requests per second.
When you look at the output you will get a combination of traces, but what you don't get is a good understanding of what traces got dropped and why.
What is the expected behavior?
I am suggesting that we have a new metric:
opentelemetry.trace.sampler.count
, implemented by the OTel SDK that provides details about the number of Activities that the sampler was called for and the sampling result. It should be dimensioned with:sampling.decision
drop
,record_only
,record_and_sample
span.parent.is_remote
true
span.parent.recorded
false
span.name
Microsoft.AspNetCore.Hosting.HttpRequestIn
sampler.description
fixedratesampler{0.2}
The
span.name
may be too varied to be suitable for use in a metric, in which case we should make this theActivitySource.Name
. The goal being to give the observer some idea of which spans/activities are being sampled each way.The expected use of the metric is to have observability into the trace sampling decisions that are being made by the sdk. By looking at the
sampling.decision
, you get a measure of how many Activities are being dropped, just recorded and those emitted. The ratios of these numbers should match what you have configured in the sampler and the incoming request rate.The additional fields are to enable better diagnostics as to why the sampling decision was made, but limiting it to the fields that have constrained enough values for use in metric dimensions.
Which alternative solutions or features have you considered?
While the sampling state is available through EventSource events, that is not easily monitored, and so including this in the metrics already produced makes more sense to me.
Additional context
This feels like something that would apply to other languages/sdks where head-based sampling occurs. The same metric could also be used for tail sampling in a component like the OTel collector.
The text was updated successfully, but these errors were encountered: