Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update Net::HTTP instrumentation to no-op on untraced contexts #722

Conversation

brunofacca
Copy link
Contributor

Update Net::HTTP instrumentation to no-op on HTTP requests within untraced contexts. Prevents the logging of called finish on an ended Span messages.

Example scenario:

The Resque instrumentation can be configured to call OpenTelemetry::SDK::TraceProvider#force_flush at the end of each job. That causes the batch span processor to immediately send everything it has to the OTLP exporter. The exporter then marks the HTTP request used to send the data to the APM server as "untraced", by adding an untraced key to the context and setting it to true. Before this PR, the Net::HTTP instrumentation did not check for the untraced flag. So it would try set attributes and call finish on a span that was not recording, creating one "called finish on an ended Span" log entry for each batch of exported spans. On a large application, that results in millions of those log entries per minute.

Copy link

linux-foundation-easycla bot commented Nov 16, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

Comment on lines 188 to 190
def tracer_stub.in_span(...)
raise 'tracer.in_span should not have been called'
end
Copy link
Contributor Author

@brunofacca brunofacca Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with Minitest. If you know a simpler or more readable way of asserting that the #in_span method is not called on trace, or even a simpler way of asserting that the instrumentation no-ops (i.e., does not attempt to manipulate spans in any way), I'm open to suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that other teste suites use rspec-mocks to deal with scenarios like this. If using that library provides you with an easier way to write interaction tests.

https://github.com/search?q=repo%3Aopen-telemetry%2Fopentelemetry-ruby-contrib%20rspec-mocks&type=code

The other option is to not worry about verifying interactions with test doubles and rely on the state-based test results, i.e. No spans are created.

https://github.com/open-telemetry/opentelemetry-ruby-contrib/pull/722/files/0c95addf071fce0980d37de81524469725cf5b6f#diff-99a3ac2b3a9bb5c2d057a09c5101ddb9d5b472344b41269b8a9f1269bceabaaaR217

Copy link
Contributor Author

@brunofacca brunofacca Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using rspec-mocks would definitely make this easier/simpler to do. I didn't know that it could be used with Minitest. Can I add it to opentelemetry-instrumentation-net_http.gemspec as a dev dependency?

The other option is to not worry about verifying interactions with test doubles and rely on the state-based test results, i.e. No spans are created.

That was my initial approach, but then I realized that the call to tracer.in_span will cause called finish on an ended Span log noise even without creating a span. That's because Tracer#in_span calls Tracer#start_span, which checks span.context.valid?(which is false for an untraced context), so it returns Span::INVALID. So the count of finished spans does not increase. That doesn't prevent it from calling finish in an untraced span though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using rspec-mocks would definitely make this easier/simpler to do. I didn't know that it could be used with Minitest. Can I add it to opentelemetry-instrumentation-net_http.gemspec as a dev dependency?

Yes, please add it. I will create a chore issue to update all the test specifications to include it in development dependencies so folks don't run into similar constraints in the future.

That doesn't prevent it from calling finish in an untraced span though.

I see. We use in_span all over the place. Is this a problem in other instrumentations and we just haven't seen the reports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please add it.

Will do, and thank you.

Is this a problem in other instrumentations.

I don't think so. At least in our app, the issue is triggered by a specific interaction between the Resque instrumentation configured with the force_flush: :always option and the Net::HTTP instrumentations, as described in the description of this PR. When the Resque instrumentation calls force_flush, the OTLP exporter says not to trace its HTTP requests to the APM server but the Net::HTTP is unaware of the untraced: true attribute in context, so it tries to add info to the span and then call finish on it. And that causes the log noise.

Here's a sample (partial) stack trace:

/usr/local/bundle/gems/opentelemetry-instrumentation-net_http-0.22.2/lib/opentelemetry/instrumentation/net/http/patches/instrumentation.rb:80:in `annotate_span_with_response!'
/usr/local/bundle/gems/opentelemetry-instrumentation-net_http-0.22.2/lib/opentelemetry/instrumentation/net/http/patches/instrumentation.rb:39:in `block (2 levels) in request'
<internal:kernel>:90:in `tap'
/usr/local/bundle/gems/opentelemetry-instrumentation-net_http-0.22.2/lib/opentelemetry/instrumentation/net/http/patches/instrumentation.rb:38:in `block in request'
/usr/local/bundle/gems/opentelemetry-api-1.2.3/lib/opentelemetry/trace/tracer.rb:37:in `block in in_span'
/usr/local/bundle/gems/opentelemetry-api-1.2.3/lib/opentelemetry/trace.rb:70:in `block in with_span'
/usr/local/bundle/gems/opentelemetry-api-1.2.3/lib/opentelemetry/context.rb:87:in `with_value'
/usr/local/bundle/gems/opentelemetry-api-1.2.3/lib/opentelemetry/trace.rb:70:in `with_span'
/usr/local/bundle/gems/opentelemetry-api-1.2.3/lib/opentelemetry/trace/tracer.rb:37:in `in_span'
/usr/local/bundle/gems/opentelemetry-instrumentation-net_http-0.22.2/lib/opentelemetry/instrumentation/net/http/patches/instrumentation.rb:31:in `request'
/usr/local/bundle/gems/opentelemetry-exporter-otlp-0.26.1/lib/opentelemetry/exporter/otlp/exporter.rb:160:in `block (2 levels) in send_bytes'
/usr/local/bundle/gems/opentelemetry-exporter-otlp-0.26.1/lib/opentelemetry/exporter/otlp/exporter.rb:238:in `measure_request_duration'
/usr/local/bundle/gems/opentelemetry-exporter-otlp-0.26.1/lib/opentelemetry/exporter/otlp/exporter.rb:160:in `block in send_bytes'
/usr/local/bundle/gems/opentelemetry-exporter-otlp-0.26.1/lib/opentelemetry/exporter/otlp/exporter.rb:128:in `block in around_request'
/usr/local/bundle/gems/opentelemetry-common-0.20.0/lib/opentelemetry/common/utilities.rb:92:in `block in untraced'
/usr/local/bundle/gems/opentelemetry-api-1.2.3/lib/opentelemetry/context.rb:87:in `with_value'
/usr/local/bundle/gems/opentelemetry-common-0.20.0/lib/opentelemetry/common/utilities.rb:91:in `untraced'
/usr/local/bundle/gems/opentelemetry-exporter-otlp-0.26.1/lib/opentelemetry/exporter/otlp/exporter.rb:128:in `around_request'
/usr/local/bundle/gems/opentelemetry-exporter-otlp-0.26.1/lib/opentelemetry/exporter/otlp/exporter.rb:152:in `send_bytes'
/usr/local/bundle/gems/opentelemetry-exporter-otlp-0.26.1/lib/opentelemetry/exporter/otlp/exporter.rb:86:in `export'
/usr/local/bundle/gems/opentelemetry-sdk-1.3.0/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb:187:in `block in export_batch'
/usr/local/bundle/gems/opentelemetry-sdk-1.3.0/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb:187:in `synchronize'
/usr/local/bundle/gems/opentelemetry-sdk-1.3.0/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb:187:in `export_batch'
/usr/local/bundle/gems/opentelemetry-sdk-1.3.0/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb:114:in `force_flush'
/usr/local/bundle/gems/opentelemetry-sdk-1.3.0/lib/opentelemetry/sdk/trace/tracer_provider.rb:109:in `block (2 levels) in force_flush'
/usr/local/bundle/gems/opentelemetry-sdk-1.3.0/lib/opentelemetry/sdk/trace/tracer_provider.rb:105:in `map'
/usr/local/bundle/gems/opentelemetry-sdk-1.3.0/lib/opentelemetry/sdk/trace/tracer_provider.rb:105:in `block in force_flush'
/usr/local/bundle/gems/opentelemetry-sdk-1.3.0/lib/opentelemetry/sdk/trace/tracer_provider.rb:101:in `synchronize'
/usr/local/bundle/gems/opentelemetry-sdk-1.3.0/lib/opentelemetry/sdk/trace/tracer_provider.rb:101:in `force_flush'
/usr/local/bundle/gems/opentelemetry-instrumentation-resque-0.5.0/lib/opentelemetry/instrumentation/resque/patches/resque_job.rb:60:in `perform'
/usr/local/bundle/gems/resque-2.6.0/lib/resque/worker.rb:322:in `perform'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR updated.

Update Net::HTTP instrumentation to no-op on HTTP requests within untraced
contexts. Prevents the logging of "called finish on an ended Span" messages.

Example scenario:

The Resque instrumentation can be configured to call
`OpenTelemetry::SDK::TraceProvider#force_flush` at the end of each job. That
causes the batch span processor to immediately send everything it has to the
OTLP exporter. The exporter then marks the HTTP request used to send the data
to the APM server as "untraced", by adding an `untraced` key to the context and
setting it to `true`. Before this commit, the `Net::HTTP` instrumentation did
not check for the `untraced` flag. So it would try set attributes and call
`finish` on a span that was not recording, creating one "called finish on an
ended Span" log entry for each batch of exported spans. On a large application,
that results in millions of those log entries per minute.
@brunofacca brunofacca force-pushed the net-http-noop-on-untraced-contexts branch from 0c95add to 75a3aed Compare November 20, 2023 20:32
@arielvalentin arielvalentin merged commit 3b8ec51 into open-telemetry:main Nov 22, 2023
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants