Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Can we define a shared thread-local storage format for Spans? #24

Closed
yurishkuro opened this issue Mar 28, 2016 · 9 comments
Closed

Can we define a shared thread-local storage format for Spans? #24

yurishkuro opened this issue Mar 28, 2016 · 9 comments
Milestone

Comments

@yurishkuro
Copy link
Member

While explicit context propagation is in many cases the ideal solution, it's not always practical and often requires that people change their code significantly. A thread-local storage (TLS) of the "current span" is widely used approach in various Java tracers. Rather than leaving this issue up for implementations, it would be useful, imo, to define a common mechanism directly in OpenTracing, and increase interoperability between different instrumentations. Specifically, we'd need to define at least two APIs:

  1. saving and retrieving the "current span"
  2. providing a wrapper Runnable / Callable / etc. for carrying the "current span" over thread boundaries

For (1), the two possible approaches are:

  • (1a) a single-span TLS
  • (1b) a stack-like TLS

Some prior discussions on Gitter:

yurishkuro Mar 24 12:33
@bensigelman @dkuebric @adriancole for in-process context propagation in Java, what are your thoughts on what thread-local storage should contain: a single current Span, or a stack of spans?

bensigelman Mar 26 00:02
@yurishkuro sorry for the delay re ^^^, just noticed this! this is a “rich” (i.e., complex) topic, but the short answer IMO is “a single current Span”.

bensigelman Mar 26 00:32
@yurishkuro @dkuebric @michaelsembwever
perhaps more interesting (IMO) is what helpers exist to set and clear that span… One idea is something like the following:

public abstract class TracedRunnable implements Runnable {
    private SpanBuilder childBuilder;
    public TracedRunnable(SpanBuilder childBuilder) {
        this.childBuilder = childBuilder;
    }

    // For subclasses to override… akin to java.lang.Runnable.run().
    public abstract void runInSpan(Span sp);

    // The compatibility bridge to java.lang.Runnable
    public final run() {
        Span parentSpan = Pseudocode.getSpanFromTLS();
        Span childSpan = childBuilder.setParent(parentSpan).start();
        runInSpan(childSpan);
        // We need to reinstate parentSpan before returning from run().
        Pseudocode.setSpanInTLS(parentSpan);
    }
}

Calling code would look approximately like this:

SpanBuilder b = tracer.buildSpan(“child_operation”);
somethingThatNeedsARunnableClosure(arg1, arg2, new SpanRunnable(b) {
    public void runInSpan(Span sp) {
        … do something, etc …
    }
});

PS: @jmacd thoughts/recollections about the above? It's not precisely what I remember from Java dapper, though maybe that’s not a bad thing. :)

yurishkuro Mar 26 18:08
@bensigelman the last line, Pseudocode.setSpanInTLS(parentSpan); is the reason I asked - it works in this simple example (if you add try/finally), but isn't that great in things like Jersey filters, because there are two independent filters, for request and for response. To to be able to reference parentSpan it needs to be stored somewhere else, since TLS slot is occupied by the child span. If TLS slot was instead a stack, then the request filter would do TLS.push(childSpan), and response filter would do child = TLS.pop(); child.end(), thus automatically restoring the parent.
what I don't like about the stack is a possibility of scoping errors, leading to mismatched push/pop

bensigelman Mar 26 22:55
@yurishkuro in the case of something like Jersey, I suppose the Span could be attached to the ContainerRequestContext. But that isn’t satisfying at all… doesn’t seem general enough.
Having thought this through more carefully, I would argue (albeit with hesitation) for something like a stack… every Span would have an optional parent Span, and at finish-time the parent Span would be reinstated into TLS. In the example you gave above, you’d get the behavior you want with TLS as long as a Span is created at Jersey request filter time, and the same span is finished at Jersey response filter time. (It can of course create children of its own, etc, etc)
If the above doesn’t make any sense, maybe there’s some sort of git gist or something we could use as a place to make the problem more concrete?

@yurishkuro
Copy link
Member Author

cc @michaelsembwever @oibe

@yurishkuro
Copy link
Member Author

Some prior art & discussions: DistributedTracing/continuation-local-storage-jvm#1

@jmacd
Copy link

jmacd commented Mar 28, 2016

@bensigelman What you posted looks good, but I recall Google didn't promote the use of wrapped Runnable objects as much as it promoted the use of a new threads for each span, so there was effectively not a stack. The RPC client never had to swap the span because it blocked, and the server never had to swap the span because it created or re-used a thread. e.g.,

  new TraceContextThread(SpanBuilder.SetOperation(...).
      StartSpan(TraceContext.getCurrentSpan() /* i.e., parent */), new Runnable() {
    public void run() { ... }
  }).start();

@toaler
Copy link

toaler commented Mar 30, 2016

We ended up creating a Span factory that provides static methods to create/destroy spans that set/remove the span from thread local. The factory also allows us to get the span in context on the corresponding thread (similar to TraceContext.getCurrentSpan()). Have not considered cross thread communication as of yet.

@yurishkuro
Copy link
Member Author

@toaler can you point to or paste some code examples? From your description of the "factory" it seems it had a much richer API than needed purely for propagating span via TLS. We wouldn't want to replicate OT API for spans in another class.

@sjoerdtalsma
Copy link
Contributor

What do you think of the following approach?

I'm in the process of releasing a 'context-propagation' library where I abstracted the 'Context' concept and created a Service interface for ContextManager and a utility-class ContextManagers that allows taking a snapshot of 'all' compatible Context implementations.
This snapshot can be 'reactivated' in another thread, returning an AutoCloseable instance, restoring the state after use.
An abstract ThreadLocal based context implementation is provided that effectively restores the state in a stack-like fashion when closed.
Having your own Context implementation cooperate in the propagation only requires declaring your ContextManager's classname in a META-INF/services property file.

https://github.com/talsma-ict/context-propagation

I would be willing to provide an implementation in 'contrib' that uses this construct to manage spans with implicit context, using delegation to existing Tracer implementations..

@yurishkuro
Copy link
Member Author

fyi, larger discussion here: opentracing/specification#23

@bhs bhs modified the milestone: 1.0.0 release Apr 1, 2017
@bhs
Copy link
Contributor

bhs commented Apr 15, 2017

also see #115...

@bhs
Copy link
Contributor

bhs commented May 24, 2017

(Closed by #115)

@bhs bhs closed this as completed May 24, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants