-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context concurrency and parallelism issues #1766
Comments
@arielvalentin thanks for your great summary of an issue, including the links to real source code which exhibits the problem. I'm from the Ruby core team and work on concurrency and issues like this. Copying around thread locals like this is like copying around private instance variables between objects, it's a horrible idea and should never be done. There is now a feature in Ruby, which does this and is expected to do this:
The fiber storage is inherited by copy so mutations in child keys isn't propagated upwards, but you can use mutable values (which must be thread safe) for this. Using thread locals for "per request state" is an extremely bad idea. |
@ioquatix Thanks for sharing your perspective here. I suspect that library maintainers are doing this out of convenience without knowing the negative impact that it has on their users. In our case I would say we generally want implicit immutable state staring within the same thread, but likely explicit sharing when spawning fibers or threads. As you pointed out if we did use fiber storage, we are still in need an efficient fiber local Immutable Stack or Array data structure to avoid non-deterministic behavior. There are also other use cases that our users tend to rely on for resetting some state process forking. I personally rely on I know it is quite risky to implement (e.g. fork bombing) but has there been any discussion to add lifecycle hooks to register custom code when processes, threads, or fibers are spawned? |
@ioquatix A related problem we have is finding a solution for re-entrant locks on our thread safe structures. The SDK currently use mutexes to ensure thread safety when mutating a Span. We have a use case where we need to allow the thread to re-enter a mutex described here. I haven't done much in the way of exploring any additional options but your input would be helpful here: |
We should strongly encourage library authors to avoid this kind of code, and graphql-ruby are going to fix their implementation: rmosolgo/graphql-ruby#5173 (comment) Fiber storage supports immutable updates, e.g.
(assuming
Yes, see For thread and fiber locals, you may prefer to use attributes, these are not going to be copied by the code previously discussed, e.g.
|
I benchmarked that approach and several others, result summary is #1760 (comment). For the recursive benchmark, a Fiber attribute is 20% slower than |
I would try to avoid having a dependency on |
You should try |
👋 This issue has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the |
On a tiny note, while the current storage mechanism is quite flexible internally, it does make it quite hard external tools such as profilers to get this data to include as correlation (e.g. DataDog/dd-trace-rb#3984 is quite a maze). #842 is an earlier discussion of this too. With opentelemetry also now having support for profiling, this use case may become more common (e.g. beyond just Datadog's Ruby profiler). |
The
OpenTelemetry::Context
uses a fiber local Ruby Array to keep track of the activeContext
instance.There are at least 2 instances in gems that are breaking encapsulation and copying fiber local variables to child threads:
This leads to concurrency and parallel processing issues because multi-threads are pushing and popping entries on the same Stack, resulting in
OpenTelemetry::Context::DetachError
as well as non-deterministic behavior where theOpenTelemetry::Context.current
method returns the wrong instance of the context object.Helpers that rely on the implicitly active Context, like
OpenTelemetry::Trace.current_span
, will end up receiving the wrong Span object instance.There is a proposed fixed here that would end up using Immutable Arrays, however this will increase the number of object allocations per Context stack mutation:
#1760
Another option I have considered is to propose using Thread Local variables instead of Fiber local variables, however that would result in concurrency issues when using Fibers where multiple fibers would have access to the same Array instance. We would need a mechanism to seed a spawned thread with a snapshot of the stack so that there could be some consistent context propagation for the gems listed above.
I would like to get some feedback from @open-telemetry/ruby-maintainers on whether or not there are addition alternatives we can consider.
The text was updated successfully, but these errors were encountered: