Proposal: OTel SDK should expose a metric to inform about sampling decisions made #5756

samsp-msft · 2024-07-17T22:00:20Z

Package

OpenTelemetry

Is your feature request related to a problem?

Imagine the scenario that you have added a sampler to your Trace configuration, something like:

    builder.Services.AddOpenTelemetry()
        .WithTracing(tracing =>
        {
            tracing.AddAspNetCoreInstrumentation()
                .AddHttpClientInstrumentation()
               .SetSampler(new ParentBasedSampler(new RateLimitingSampler(3)))
        });

This will result in sampling decisions being made based on whether the incoming request has a trace header - if so it will honor that header, and if not it will sample a max of 3 requests per second.

When you look at the output you will get a combination of traces, but what you don't get is a good understanding of what traces got dropped and why.

What is the expected behavior?

I am suggesting that we have a new metric: opentelemetry.trace.sampler.count, implemented by the OTel SDK that provides details about the number of Activities that the sampler was called for and the sampling result. It should be dimensioned with:

Name	Type	Description	Example
`sampling.decision`	string	The final verdict as to how the activity/span was flagged for sampling	`drop`, `record_only`, `record_and_sample`
`span.parent.is_remote`	bool	Whether the parent is a remote span or local	`true`
`span.parent.recorded`	bool	Whether the parent trace has the recorded flag set or not	`false`
`span.name`	string	The name of the span that is being sampled	`Microsoft.AspNetCore.Hosting.HttpRequestIn`
`sampler.description`	string	The description from the sampler that made the decision	`fixedratesampler{0.2}`

The span.name may be too varied to be suitable for use in a metric, in which case we should make this the ActivitySource.Name. The goal being to give the observer some idea of which spans/activities are being sampled each way.

The expected use of the metric is to have observability into the trace sampling decisions that are being made by the sdk. By looking at the sampling.decision, you get a measure of how many Activities are being dropped, just recorded and those emitted. The ratios of these numbers should match what you have configured in the sampler and the incoming request rate.

The additional fields are to enable better diagnostics as to why the sampling decision was made, but limiting it to the fields that have constrained enough values for use in metric dimensions.

Which alternative solutions or features have you considered?

While the sampling state is available through EventSource events, that is not easily monitored, and so including this in the metrics already produced makes more sense to me.

Additional context

This feels like something that would apply to other languages/sdks where head-based sampling occurs. The same metric could also be used for tail sampling in a component like the OTel collector.

The text was updated successfully, but these errors were encountered:

samsp-msft · 2024-07-17T22:11:49Z

Tagging some folks: @reyang @lmolkova @CodeBlanch @cijothomas @noahfalk @kalyanaj

kalyanaj · 2024-07-18T15:25:12Z

Tagging @jmacd (to evaluate/consider this feedback for the specification).

cijothomas · 2024-07-18T15:31:20Z

I think this issue is best moved to spec/sem.convention repo, as this is independent of language implementation.

samsp-msft added the enhancement New feature or request label Jul 17, 2024

github-actions bot added the pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package label Jul 17, 2024

CodeBlanch added this to the Future milestone Aug 16, 2024

CodeBlanch added the needs-spec-change Issues which require the OpenTelemetry Specification to clarify or define behavior label Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: OTel SDK should expose a metric to inform about sampling decisions made #5756

Proposal: OTel SDK should expose a metric to inform about sampling decisions made #5756

samsp-msft commented Jul 17, 2024 •

edited

Loading

samsp-msft commented Jul 17, 2024

kalyanaj commented Jul 18, 2024

cijothomas commented Jul 18, 2024

Proposal: OTel SDK should expose a metric to inform about sampling decisions made #5756

Proposal: OTel SDK should expose a metric to inform about sampling decisions made #5756

Comments

samsp-msft commented Jul 17, 2024 • edited Loading

Package

Is your feature request related to a problem?

What is the expected behavior?

Which alternative solutions or features have you considered?

Additional context

samsp-msft commented Jul 17, 2024

kalyanaj commented Jul 18, 2024

cijothomas commented Jul 18, 2024

samsp-msft commented Jul 17, 2024 •

edited

Loading