Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[logs] Mitigate unwanted object creation during configuration reload #5514

Merged
merged 19 commits into from
Apr 12, 2024

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Apr 5, 2024

Fixes #5513

Changes

  • Treat OpenTelemetryLoggerOptions as a singleton and ignore IConfiguration reload(s).

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes

@CodeBlanch CodeBlanch added pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package logs Logging signal related labels Apr 5, 2024
@CodeBlanch CodeBlanch requested a review from a team April 5, 2024 21:40
Copy link

codecov bot commented Apr 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.30%. Comparing base (6250307) to head (cbbbbaa).
Report is 176 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5514      +/-   ##
==========================================
+ Coverage   83.38%   85.30%   +1.91%     
==========================================
  Files         297      289       -8     
  Lines       12531    12580      +49     
==========================================
+ Hits        10449    10731     +282     
+ Misses       2082     1849     -233     
Flag Coverage Δ
unittests ?
unittests-Instrumentation-Experimental 24.10% <0.00%> (?)
unittests-Instrumentation-Stable 24.14% <0.00%> (?)
unittests-Solution-Experimental 85.29% <100.00%> (?)
unittests-Solution-Stable 85.25% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...lemetry/Logs/ILogger/OpenTelemetryLoggerOptions.cs 86.95% <ø> (ø)
...try/Logs/ILogger/OpenTelemetryLoggingExtensions.cs 97.43% <100.00%> (+0.03%) ⬆️

... and 64 files with indirect coverage changes

@utpilla
Copy link
Contributor

utpilla commented Apr 5, 2024

I call this a mitigation because there really isn't a way to fix this. We will just now clean up these processors so the GC can do its thing.

Is there no way to prevent the creation of these processors? Also, there was some discussion around this happening due to the usage of IOptionsMonitor. If that's the case, is it feasible to get rid of IOptionsMonitor?

@CodeBlanch
Copy link
Member Author

@utpilla

Is there no way to prevent the creation of these processors?

I haven't come up with a way to prevent it. The problem is in these configuration delegates users may do this:

options.AddProcessor(new MyProcessor());

MyProcessor is created before the call to AddProcessor. All we can really do is make sure it gets disposed.

Also, there was some discussion around this happening due to the usage of IOptionsMonitor. If that's the case, is it feasible to get rid of IOptionsMonitor?

Kind of but not totally.

We could change these two spots...

var options = sp.GetRequiredService<IOptionsMonitor<OpenTelemetryLoggerOptions>>().CurrentValue;

sp.GetRequiredService<IOptionsMonitor<OpenTelemetryLoggerOptions>>().CurrentValue,

... to use IOptions<OpenTelemetryLoggerOptions>.Value and our tests would pass making it look like the issue went away.

But we have this public ctor we can't change:

public OpenTelemetryLoggerProvider(IOptionsMonitor<OpenTelemetryLoggerOptions> options)

Anyone using that will run into the problem.

Also we can't prevent anyone from doing sp.GetRequiredService<IOptionsMonitor<OpenTelemetryLoggerOptions>>() on their own which would reintroduce the issue.

We may want to also support hot reload of things like IncludeFormattedMessage in the future which would require use of IOptionsMonitor.

I decided to go this route because I thought it would work more broadly.

@reyang
Copy link
Member

reyang commented Apr 8, 2024

I call this a mitigation because there really isn't a way to fix this. We will just now clean up these processors so the GC can do its thing.

Is there no way to prevent the creation of these processors? Also, there was some discussion around this happening due to the usage of IOptionsMonitor. If that's the case, is it feasible to get rid of IOptionsMonitor?

+1, I think the issue is not about "thread leaks", it is "unwanted objects creation during configuration refresh".

@utpilla
Copy link
Contributor

utpilla commented Apr 8, 2024

But we have this public ctor we can't change:
Anyone using that will run into the problem.
Also we can't prevent anyone from doing sp.GetRequiredService<IOptionsMonitor<OpenTelemetryLoggerOptions>>() on their own which would reintroduce the issue.

That should be fine, right? If a user is explicitly using IOptionsMonitor they should be aware of the consequences of using it.

We may want to also support hot reload of things like IncludeFormattedMessage in the future which would require use of IOptionsMonitor.

We could still make that available in future using some newer public API or parameter. However, our default setup can avoid IOptionsMonitor.

@CodeBlanch
Copy link
Member Author

CodeBlanch commented Apr 9, 2024

That should be fine, right? If a user is explicitly using IOptionsMonitor they should be aware of the consequences of using it.

How will they know OpenTelemetryLoggerOptions has these quirks/pitfalls/anti-patterns when used via IOptionsMonitor? In that same vein, how do we prevent future issues from being raised which will require triage?

We could still make that available in future using some newer public API or parameter.

Let's say some contributor comes and adds an option EnableReload and they hook up IOptionsMonitor when that is set. Same question as above. How will that contributor avoid re-introducing this issue?

Simply switching to IOptions<OpenTelemetryLoggerOptions> seems to just kick the can down the road.

I just pushed updates. What I decided to do was just disable reload completely for OpenTelemetryLoggerOptions. Now whatever the code in the process uses (either IOptions or IOptionsMonitor or both) the following will be true:

  • There will only ever be a single instance of OpenTelemetryLoggerOptions.
  • That instance will never be reloaded.
  • Configurations and validations will only ever be processed during startup when that initial instance is created.

@reyang
Copy link
Member

reyang commented Apr 9, 2024

Let's say some contributor comes and adds an option EnableReload and they hook up IOptionsMonitor when that is set. Same question as above. How will that contributor avoid re-introducing this issue?

The golden answer: unit test.

@CodeBlanch
Copy link
Member Author

@reyang

Let's say some contributor comes and adds an option EnableReload and they hook up IOptionsMonitor when that is set. Same question as above. How will that contributor avoid re-introducing this issue?

The golden answer: unit test.

That's the thing. If I did what @utpilla originally suggested and just switched to IOptions<OpenTelemetryLoggerOptions> adding a unit test using IOptionsMonitor<OpenTelemetryLoggerOptions> would just re-introduce the issue! We could have a test validating that the behavior fails as we would expect. But that doesn't really help the above, I don't think. Our friendly future contributor adding EnableReload wouldn't know any difference... the test verifying it is bugged would just continue to pass with the bug.

But I think the point is moot. The current form of the PR disabling reload I think checks all the boxes. If in the future this friendly contributor wants to add EnableReload they will have to do it in a way that passes the unit tests. Good luck to them 🤣

@CodeBlanch CodeBlanch changed the title [logs] Mitigate thread leaks when reloading configuration [logs] Mitigate unwanted object creation during configuration reload Apr 9, 2024
@utpilla
Copy link
Contributor

utpilla commented Apr 9, 2024

From #5513

The delegate passed to OpenTelemetryLoggingExtensions.AddOpenTelemetry is registered as an options configuration (IConfigureOptions) and will be replayed on a fresh instance of OpenTelemetryLoggerOptions anytime the application IConfiguration is reloaded.

@CodeBlanch Could you also tell why is this only a problem for Logs and not other signals? Or why was IConfigureOptions only required for Logs?

@@ -2,6 +2,12 @@

## Unreleased

* New instances of `OpenTelemetryLoggerOptions` will no longer be created during
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the description (which is phrased as a behavior change) here might be confusing to users - most of them probably don't even know about OpenTelemetryLoggerOptions.

Consider being more explicit here by starting with "Fixed an issue ..." (whether OpenTelemetryLoggerOptions will be created or not is an implementation detail that we might change later if we figured out how to properly support dynamic configuration reload).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New version pushed

@reyang
Copy link
Member

reyang commented Apr 9, 2024

@reyang

Let's say some contributor comes and adds an option EnableReload and they hook up IOptionsMonitor when that is set. Same question as above. How will that contributor avoid re-introducing this issue?

The golden answer: unit test.

That's the thing. If I did what @utpilla originally suggested and just switched to IOptions<OpenTelemetryLoggerOptions> adding a unit test using IOptionsMonitor<OpenTelemetryLoggerOptions> would just re-introduce the issue! We could have a test validating that the behavior fails as we would expect. But that doesn't really help the above, I don't think. Our friendly future contributor adding EnableReload wouldn't know any difference... the test verifying it is bugged would just continue to pass with the bug.

But I think the point is moot. The current form of the PR disabling reload I think checks all the boxes. If in the future this friendly contributor wants to add EnableReload they will have to do it in a way that passes the unit tests. Good luck to them 🤣

Sorry I don't understand, could you add a unit test to make sure that even if configuration got changed, there is only one processor/exporter created? (fail the test if there is more than one exporter created)

@CodeBlanch
Copy link
Member Author

CodeBlanch commented Apr 9, 2024

Could you also tell why is this only a problem for Logs and not other signals? Or why was IConfigureOptions only required for Logs?

A lot of history here. I'll do my best to try and explain it 🤣

There are two different concepts: Builders and Options. Builders use the IServiceCollection to register services. Options are retrieved via the IServiceProvider. Tracing and Metrics have true builders. They operate over an IServiceCollection. Logging uses Options. Its Options are automatically mapped to the logging:OpenTelemetry config path by ILogger. OpenTelemetryLoggerOptions doesn't exist until the IServiceProvider has been created. The AddProcessor API on OpenTelemetryLoggerOptions is really a Builder API but it lives on an Options class. This is really the original sin. It creates a bunch of weirdness and prevents logging from having parity with Tracing and Metrics. This is what OTEL1000 is about.

IConfigureOptions is not used for Tracing and Metrics because it is using Builders and not Options.

Would it fix the issue in Logging to not do this?

        if (configureOptions != null)
            services.Configure(configureOptions);

It will fix some issues (but not all) and it will introduce whole new issues 🤣

For example users can do these types of things...

services.Configure<OpenTelemetryLoggerOptions>(o => o.AddProcessor(...));
services.Configure<OpenTelemetryLoggerOptions>(o => o.AddSomeExporterViaSomeExtensions(...));
services.AddOptions<OpenTelemetryLoggerOptions>()
    .Configure<IConfiguration>((options, config) =>
    {
        if (config.GetValue("OpenTelemetry:EnableLogging", false))
        {
            options.AddProcessor(new BatchLogRecordExportProcessor(new MyExporter()));
        }
    });

If we remove our IConfigureOptions registration it won't do anything to help with other valid Options API patterns.

But removing it will introduce ordering issues. For example this...

services.AddLogging(logging => logging.AddOpenTelemetry(o => o.IncludeFormattedMessage = true));
services.PostConfigure<OpenTelemetryLoggerOptions>(o => o.IncludeFormattedMessage = false);

Options API has a certain order it executes in. In that code the PostConfigure delegate will run after the delegate passed to AddOpenTelemetry. PostConfigure will be the last thing run. The reason this works is we use IConfigureOptions to play nice with Options API.

If we did something like this instead...

var options = sp.GetRequiredService<IOptions<OpenTelemetryLoggerOptions>>().Value;
if (configureOptions != null)
   configureOptions(options); // Manually invoke configuration delegate instead of using Options API

What that is doing is manually invoking the configuration delegate on the final options instance. That will make the order essentially:

// Execute PostConfigure
o.IncludeFormattedMessage = false; // What the user wanted
// Manually invoke configure delegate
o => o.IncludeFormattedMessage = true; // What the user ends up with

Logging using IConfigureOptions instead of manually invoking the delegate is about making sure its Options behave consistently with the Options API overall.

@CodeBlanch
Copy link
Member Author

Sorry I don't understand, could you add a unit test to make sure that even if configuration got changed, there is only one processor/exporter created? (fail the test if there is more than one exporter created)

The tests are there already:

https://github.com/open-telemetry/opentelemetry-dotnet/pull/5514/files#diff-e0db2cd47f5135f96f7b5db2a56f02e95201fcb1e72ce67040fdce7daa0dc420R304

https://github.com/open-telemetry/opentelemetry-dotnet/pull/5514/files#diff-e0db2cd47f5135f96f7b5db2a56f02e95201fcb1e72ce67040fdce7daa0dc420R371

What I was saying is those tests are not possible using the design @utpilla was suggesting.

@reyang
Copy link
Member

reyang commented Apr 9, 2024

Sorry I don't understand, could you add a unit test to make sure that even if configuration got changed, there is only one processor/exporter created? (fail the test if there is more than one exporter created)

What I was saying is those tests are not possible using the design @utpilla was suggesting.

I wasn't following this. I'm good 👍 as long as there is unit test coverage.

reyang
reyang previously approved these changes Apr 10, 2024
Comment on lines 176 to 180
// Note: We disable built-in IOptionsMonitor features for
// OpenTelemetryLoggerOptions as a workaround to prevent unwanted
// objects (processors, exporters, etc.) being created by
// configuration delegates during reload of IConfiguration.
services.DisableOptionsMonitor<OpenTelemetryLoggerOptions>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need all this infrastructure? Instead, can't you just use IOptions instead of IOptionsMonitor everywhere?

cc @davidfowl @halter73 @tarekgh

Copy link
Member Author

@CodeBlanch CodeBlanch Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is changing OTel to use IOptions (instead of IOptionsMonitor) for OpenTelemetryLoggerOptions. But my thinking is, we can't prevent users from accessing IOptionsMonitor and we can't prevent some future dev from re-introducing it. The infrastructure here is so we can make it deterministic and have unit tests validating it will work correctly should IOptionsMonitor<OpenTelemetryLoggerOptions> sneak into the process anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we really expect someone outside of OpenTelemetry to try to get an IOptionsMonitor<OpenTelemetryLoggerOptions>? What would they do with it?
  2. If someone does explicitly use IOptionsMonitor, don't they want updates as the config changes? That's why they choose IOptionsMonitor.

Would it be better to just fail in this case if we explicitly want to block it?

Copy link

@halter73 halter73 Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's unclear why anyone else would need to monitor OpenTelemetryLoggerOptions for changes when the core library won't react to changes itself, it feels like code smell to go out of our way prevent it by replacing core options services and changing how they work. What if someone wants to just new up an OpenTelemetryLoggerOptions and calls options.AddProcessor(new BatchLogRecordExportProcessor(new OtlpLogExporter(new())) in a test or something? Should they know that this will spawn a thread that will never get stopped?

I think the core issue is how BatchExportProcessor<T> spawns a thread in its constructor.

this.exporterThread = new Thread(this.ExporterProc)
{
IsBackground = true,
Name = $"OpenTelemetry-{nameof(BatchExportProcessor<T>)}-{exporter.GetType().Name}",
};
this.exporterThread.Start();

I know that this API has already shipped, so adding a StartAsync method or something like that may not be feasible, but could you unseal OnStart in the base class and start the thread there? Or lazily start the thread some other way? If not, should we deprecate the BaseProcessor<LogRecord> overload of AddProcessor() and tell people to use the Func<IServiceProvider, BaseProcessor<LogRecord> overload instead?

If we have to override the IOptionsMonitor<OpenTelemetryLoggerOptions>, I agree with @eerhardt that it should throw from everything with a NotSupportedException indicating that reloading OpenTelemetryLoggerOptions is completely unsupported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed updates so that accessing OpenTelemetryLoggerOptions via IOptionsMonitor or IOptionsSnapshot will result in a NotSupportedException. Technically breaking, but I think the impact will be very low.

If not, should we deprecate the BaseProcessor overload of AddProcessor() and tell people to use the Func<IServiceProvider, BaseProcessor overload instead?

This is the plan yes (more or less)! We have another API for building logging pipelines which does not suffer from these issues. The plan is to make that a stable API (#5442) and then we can Obsolete these AddProcessor methods on OpenTelemetryLoggerOptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good discussion and thanks for the suggestions/ideas!

While I agree with most parts here, I suggest that we focus on "mitigate the issue quickly with minimum change/risk" instead of trying to get a complete solution in this PR. I'm specifically concerned about throwing exception in this PR. I suggest that we take the feedback and think about exception or other approaches in a follow up PR once the mitigation/hotfix is released.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we focus on "mitigate the issue quickly with minimum change/risk" instead of trying to get a complete solution in this PR

So that means just changing our references of IOptionsMonitor<OpenTelemetryLoggerOptions> => IOptions<OpenTelemetryLoggerOptions>? Don't do anything to prevent users from accessing IOptionsMonitor. Am I understanding that correctly?

If so, I agree that since this is a hotfix, keep it scoped to resolving the issue at hand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinion regarding the actual solution (e.g. IOptions/IOptionsMonitor, singleton, etc.), as long as it meets the two conditions #5514 (review) I'm good 👍.

@reyang reyang dismissed their stale review April 11, 2024 22:08

I don't think throwing exception in this PR is the right thing to do, might be a good topic for another PR.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm fine with whatever approach that stops the unwanted object creation as long as the following conditions meet:

  1. It is a simple change with very low risk, ideally with minimal lines of code touched.
  2. It is not giving user another shock (e.g. throwing exceptions and killing their app).

Any improvements can be made in future PRs after this hotfix.

@CodeBlanch CodeBlanch merged commit 0bbebb5 into open-telemetry:main Apr 12, 2024
50 checks passed
@CodeBlanch CodeBlanch deleted the sdk-log-options-reload branch April 12, 2024 18:17
CodeBlanch added a commit to CodeBlanch/opentelemetry-dotnet that referenced this pull request Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logs Logging signal related pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[logs] Reloading application configuration may lead to unnecessary thread creation
5 participants