Can we define a shared thread-local storage format for Spans? #24

yurishkuro · 2016-03-28T14:32:46Z

While explicit context propagation is in many cases the ideal solution, it's not always practical and often requires that people change their code significantly. A thread-local storage (TLS) of the "current span" is widely used approach in various Java tracers. Rather than leaving this issue up for implementations, it would be useful, imo, to define a common mechanism directly in OpenTracing, and increase interoperability between different instrumentations. Specifically, we'd need to define at least two APIs:

saving and retrieving the "current span"
providing a wrapper Runnable / Callable / etc. for carrying the "current span" over thread boundaries

For (1), the two possible approaches are:

(1a) a single-span TLS
(1b) a stack-like TLS

Some prior discussions on Gitter:

yurishkuro Mar 24 12:33
@bensigelman @dkuebric @adriancole for in-process context propagation in Java, what are your thoughts on what thread-local storage should contain: a single current Span, or a stack of spans?

bensigelman Mar 26 00:02
@yurishkuro sorry for the delay re ^^^, just noticed this! this is a “rich” (i.e., complex) topic, but the short answer IMO is “a single current Span”.

bensigelman Mar 26 00:32
@yurishkuro @dkuebric @michaelsembwever
perhaps more interesting (IMO) is what helpers exist to set and clear that span… One idea is something like the following:

public abstract class TracedRunnable implements Runnable {
    private SpanBuilder childBuilder;
    public TracedRunnable(SpanBuilder childBuilder) {
        this.childBuilder = childBuilder;
    }

    // For subclasses to override… akin to java.lang.Runnable.run().
    public abstract void runInSpan(Span sp);

    // The compatibility bridge to java.lang.Runnable
    public final run() {
        Span parentSpan = Pseudocode.getSpanFromTLS();
        Span childSpan = childBuilder.setParent(parentSpan).start();
        runInSpan(childSpan);
        // We need to reinstate parentSpan before returning from run().
        Pseudocode.setSpanInTLS(parentSpan);
    }
}

Calling code would look approximately like this:

SpanBuilder b = tracer.buildSpan(“child_operation”);
somethingThatNeedsARunnableClosure(arg1, arg2, new SpanRunnable(b) {
    public void runInSpan(Span sp) {
        … do something, etc …
    }
});

PS: @jmacd thoughts/recollections about the above? It's not precisely what I remember from Java dapper, though maybe that’s not a bad thing. :)

yurishkuro Mar 26 18:08
@bensigelman the last line, Pseudocode.setSpanInTLS(parentSpan); is the reason I asked - it works in this simple example (if you add try/finally), but isn't that great in things like Jersey filters, because there are two independent filters, for request and for response. To to be able to reference parentSpan it needs to be stored somewhere else, since TLS slot is occupied by the child span. If TLS slot was instead a stack, then the request filter would do TLS.push(childSpan), and response filter would do child = TLS.pop(); child.end(), thus automatically restoring the parent.
what I don't like about the stack is a possibility of scoping errors, leading to mismatched push/pop

bensigelman Mar 26 22:55
@yurishkuro in the case of something like Jersey, I suppose the Span could be attached to the ContainerRequestContext. But that isn’t satisfying at all… doesn’t seem general enough.
Having thought this through more carefully, I would argue (albeit with hesitation) for something like a stack… every Span would have an optional parent Span, and at finish-time the parent Span would be reinstated into TLS. In the example you gave above, you’d get the behavior you want with TLS as long as a Span is created at Jersey request filter time, and the same span is finished at Jersey response filter time. (It can of course create children of its own, etc, etc)
If the above doesn’t make any sense, maybe there’s some sort of git gist or something we could use as a place to make the problem more concrete?

The text was updated successfully, but these errors were encountered:

yurishkuro · 2016-03-28T14:33:05Z

cc @michaelsembwever @oibe

yurishkuro · 2016-03-28T16:40:19Z

Some prior art & discussions: DistributedTracing/continuation-local-storage-jvm#1

jmacd · 2016-03-28T20:18:49Z

@bensigelman What you posted looks good, but I recall Google didn't promote the use of wrapped Runnable objects as much as it promoted the use of a new threads for each span, so there was effectively not a stack. The RPC client never had to swap the span because it blocked, and the server never had to swap the span because it created or re-used a thread. e.g.,

  new TraceContextThread(SpanBuilder.SetOperation(...).
      StartSpan(TraceContext.getCurrentSpan() /* i.e., parent */), new Runnable() {
    public void run() { ... }
  }).start();

toaler · 2016-03-30T16:32:31Z

We ended up creating a Span factory that provides static methods to create/destroy spans that set/remove the span from thread local. The factory also allows us to get the span in context on the corresponding thread (similar to TraceContext.getCurrentSpan()). Have not considered cross thread communication as of yet.

yurishkuro · 2016-03-30T20:42:14Z

@toaler can you point to or paste some code examples? From your description of the "factory" it seems it had a much richer API than needed purely for propagating span via TLS. We wouldn't want to replicate OT API for spans in another class.

sjoerdtalsma · 2016-12-06T15:55:31Z

What do you think of the following approach?

I'm in the process of releasing a 'context-propagation' library where I abstracted the 'Context' concept and created a Service interface for ContextManager and a utility-class ContextManagers that allows taking a snapshot of 'all' compatible Context implementations.
This snapshot can be 'reactivated' in another thread, returning an AutoCloseable instance, restoring the state after use.
An abstract ThreadLocal based context implementation is provided that effectively restores the state in a stack-like fashion when closed.
Having your own Context implementation cooperate in the propagation only requires declaring your ContextManager's classname in a META-INF/services property file.

https://github.com/talsma-ict/context-propagation

I would be willing to provide an implementation in 'contrib' that uses this construct to manage spans with implicit context, using delegation to existing Tracer implementations..

yurishkuro · 2016-12-06T18:55:22Z

fyi, larger discussion here: opentracing/specification#23

bhs · 2017-04-15T22:40:54Z

also see #115...

bhs · 2017-05-24T06:13:44Z

(Closed by #115)

bhs modified the milestone: 1.0.0 release Apr 1, 2017

bhs closed this as completed May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we define a shared thread-local storage format for Spans? #24

Can we define a shared thread-local storage format for Spans? #24

yurishkuro commented Mar 28, 2016

yurishkuro commented Mar 28, 2016

yurishkuro commented Mar 28, 2016

jmacd commented Mar 28, 2016

toaler commented Mar 30, 2016

yurishkuro commented Mar 30, 2016

sjoerdtalsma commented Dec 6, 2016

yurishkuro commented Dec 6, 2016

bhs commented Apr 15, 2017

bhs commented May 24, 2017

Can we define a shared thread-local storage format for Spans? #24

Can we define a shared thread-local storage format for Spans? #24

Comments

yurishkuro commented Mar 28, 2016

yurishkuro commented Mar 28, 2016

yurishkuro commented Mar 28, 2016

jmacd commented Mar 28, 2016

toaler commented Mar 30, 2016

yurishkuro commented Mar 30, 2016

sjoerdtalsma commented Dec 6, 2016

yurishkuro commented Dec 6, 2016

bhs commented Apr 15, 2017

bhs commented May 24, 2017