Brave is a library used to capture and report latency information about distributed operations to Zipkin. Most users won't use Brave directly, rather libraries or frameworks than employ Brave on their behalf.
This module includes tracer creates and joins spans that model the latency of potentially distributed work. It also includes libraries to propagate the trace context over network boundaries, for example, via http headers.
Most importantly, you need a Tracer, configured to report to Zipkin.
Here's an example setup that sends trace data (spans) to Zipkin over http (as opposed to Kafka).
// Configure a reporter, which controls how often spans are sent
// (the dependency is io.zipkin.reporter2:zipkin-sender-okhttp3)
sender = OkHttpSender.create("http://127.0.0.1:9411/api/v2/spans");
spanReporter = AsyncReporter.create(sender);
// Create a tracing component with the service name you want to see in Zipkin.
tracing = Tracing.newBuilder()
.localServiceName("my-service")
.spanReporter(spanReporter)
.build();
// Tracing exposes objects you might need, most importantly the tracer
tracer = tracing.tracer();
// Failing to close resources can result in dropped spans! When tracing is no
// longer needed, close the components you made in reverse order. This might be
// a shutdown hook for some users.
tracing.close();
spanReporter.close();
sender.close();
If you need to connect to an older version of the Zipkin api, you can use the following to use Zipkin v1 format. See zipkin-reporter for more.
sender = URLConnectionSender.create("http://localhost:9411/api/v1/spans");
reporter = AsyncReporter.builder(sender)
.build(SpanBytesEncoder.JSON_V1);
The tracer creates and joins spans that model the latency of potentially distributed work. It can employ sampling to reduce overhead in process or to reduce the amount of data sent to Zipkin.
Spans returned by a tracer report data to Zipkin when finished, or do nothing if unsampled. After starting a span, you can annotate events of interest or add tags containing details or lookup keys.
Spans have a context which includes trace identifiers that place it at the correct spot in the tree representing the distributed operation.
When tracing code that never leaves your process, run it inside a scoped span.
// Start a new trace or a span within an existing trace representing an operation
ScopedSpan span = tracer.startScopedSpan("encode");
try {
// The span is in "scope" meaning downstream code such as loggers can see trace IDs
return encoder.encode();
} catch (RuntimeException | Error e) {
span.error(e); // Unless you handle exceptions, you might not know the operation failed!
throw e;
} finally {
span.finish(); // always finish the span
}
When you need more features, or finer control, use the Span
type:
// Start a new trace or a span within an existing trace representing an operation
Span span = tracer.nextSpan().name("encode").start();
// Put the span in "scope" so that downstream code such as loggers can see trace IDs
try (SpanInScope ws = tracer.withSpanInScope(span)) {
return encoder.encode();
} catch (RuntimeException | Error e) {
span.error(e); // Unless you handle exceptions, you might not know the operation failed!
throw e;
} finally {
span.finish(); // note the scope is independent of the span. Always finish a span.
}
Both of the above examples report the exact same span on finish!
In the above example, the span will be either a new root span or the next child in an existing trace. How this works is described later.
Once you have a span, you can add tags to it, which can be used as lookup keys or details. For example, you might add a tag with your runtime version.
span.tag("clnt/finagle.version", "6.36.0");
When exposing the ability to customize spans to third parties, prefer
brave.SpanCustomizer
as opposed to brave.Span
. The former is simpler to
understand and test, and doesn't tempt users with span lifecycle hooks.
interface MyTraceCallback {
void request(Request request, SpanCustomizer customizer);
}
Since brave.Span
implements brave.SpanCustomizer
, it is just as easy for you
to pass to users.
Ex.
for (MyTraceCallback callback : userCallbacks) {
callback.request(request, span);
}
Sometimes you won't know if a trace is in progress or not, and you don't
want users to do null checks. brave.CurrentSpanCustomizer
adds to any
span that's in progress or drops data accordingly.
Ex.
// Some DI configuration wires up the current span customizer
@Bean SpanCustomizer currentSpanCustomizer(Tracing tracing) {
return CurrentSpanCustomizer.create(tracing);
}
// user code can then inject this without a chance of it being null.
@Inject SpanCustomizer span;
void userCode() {
span.annotate("tx.started");
...
}
Check for instrumentation written here and Zipkin's list before rolling your own RPC instrumentation!
RPC tracing is often done automatically by interceptors. Under the scenes, they add tags and events that relate to their role in an RPC operation.
Note: this is intentionally not an HTTP example, as we have a special layer for that.
Here's an example of a client span:
// before you send a request, add metadata that describes the operation
span = tracer.nextSpan().name(service + "/" + method).kind(CLIENT);
span.tag("myrpc.version", "1.0.0");
span.remoteServiceName("backend");
span.remoteIpAndPort("172.3.4.1", 8108);
// Add the trace context to the request, so it can be propagated in-band
tracing.propagation().injector(Request::addHeader)
.inject(span.context(), request);
// when the request is scheduled, start the span
span.start();
// if there is an error, tag the span
span.tag("error", error.getCode());
// when the response is complete, finish the span
span.finish();
Sometimes you need to model an asynchronous operation, where there is a
request, but no response. In normal RPC tracing, you use span.finish()
which indicates the response was received. In one-way tracing, you use
span.flush()
instead, as you don't expect a response.
Here's how a client might model a one-way operation
// start a new span representing a client request
oneWaySend = tracer.nextSpan().name(service + "/" + method).kind(CLIENT);
--snip--
// Add the trace context to the request, so it can be propagated in-band
tracing.propagation().injector(Request::addHeader)
.inject(oneWaySend.context(), request);
// fire off the request asynchronously, totally dropping any response
request.execute();
// start the client side and flush instead of finish
oneWaySend.start().flush();
And here's how a server might handle this..
// pull the context out of the incoming request
extractor = tracing.propagation().extractor(Request::getHeader);
// convert that context to a span which you can name and add tags to
oneWayReceive = nextSpan(tracer, extractor.extract(request))
.name("process-request")
.kind(SERVER)
... add tags etc.
// start the server side and flush instead of finish
oneWayReceive.start().flush();
// you should not modify this span anymore as it is complete. However,
// you can create children to represent follow-up work.
next = tracer.newSpan(oneWayReceive.context()).name("step2").start();
Note The above propagation logic is a simplified version of our http handlers.
There's a working example of a one-way span here.
Sampling may be employed to reduce the data collected and reported out of process. When a span isn't sampled, it adds no overhead (noop).
Sampling is an up-front decision, meaning that the decision to report data is made at the first operation in a trace, and that decision is propagated downstream.
By default, there's a global sampler that applies a single rate to all
traced operations. Tracing.Builder.sampler
is how you indicate this,
and it defaults to trace every request.
For example, to choose 10 traces every second, you'd initialize like so:
tracing = Tracing.newBuilder()
.sampler(RateLimitingSampler.create(10))
--snip--
.build();
Some need to sample based on the type or annotations of a java method.
Most users will use a framework interceptor which automates this sort of policy. Here's how they might work internally.
// derives a sample rate from an annotation on a java method
DeclarativeSampler<Traced> sampler = DeclarativeSampler.create(Traced::sampleRate);
@Around("@annotation(traced)")
public Object traceThing(ProceedingJoinPoint pjp, Traced traced) throws Throwable {
// When there is no trace in progress, this decides using an annotation
Sampler decideUsingAnnotation = declarativeSampler.toSampler(traced);
Tracer tracer = tracing.tracer().withSampler(decideUsingAnnotation);
// This code looks the same as if there was no declarative override
ScopedSpan span = tracer.startScopedSpan(spanName(pjp));
try {
return pjp.proceed();
} catch (RuntimeException | Error e) {
span.error(e);
throw e;
} finally {
span.finish();
}
}
You may want to apply different policies depending on what the operation is. For example, you might not want to trace requests to static resources such as images, or you might want to trace all requests to a new api.
Most users will use a framework interceptor which automates this sort of policy. Here's how they might work internally.
Sampler fallback = tracing.sampler();
Span nextSpan(final Request input) {
Sampler requestBased = Sampler() {
@Override public boolean isSampled(long traceId) {
if (input.url().startsWith("/experimental")) {
return true;
} else if (input.url().startsWith("/static")) {
return false;
}
return fallback.isSampled(traceId);
}
};
return tracer.withSampler(requestBased).nextSpan();
}
Note: the above is the basis for the built-in http sampler
Propagation is needed to ensure activity originating from the same root are collected together in the same trace. The most common propagation approach is to copy a trace context from a client sending an RPC request to a server receiving it.
For example, when an downstream Http call is made, its trace context is sent along with it, encoded as request headers:
Client Span Server Span
┌──────────────────┐ ┌──────────────────┐
│ │ │ │
│ TraceContext │ Http Request Headers │ TraceContext │
│ ┌──────────────┐ │ ┌───────────────────┐ │ ┌──────────────┐ │
│ │ TraceId │ │ │ X─B3─TraceId │ │ │ TraceId │ │
│ │ │ │ │ │ │ │ │ │
│ │ ParentSpanId │ │ Extract │ X─B3─ParentSpanId │ Inject │ │ ParentSpanId │ │
│ │ ├─┼─────────>│ ├────────┼>│ │ │
│ │ SpanId │ │ │ X─B3─SpanId │ │ │ SpanId │ │
│ │ │ │ │ │ │ │ │ │
│ │ Sampled │ │ │ X─B3─Sampled │ │ │ Sampled │ │
│ └──────────────┘ │ └───────────────────┘ │ └──────────────┘ │
│ │ │ │
└──────────────────┘ └──────────────────┘
The names above are from B3 Propagation, which is built-in to Brave and has implementations in many languages and frameworks.
Most users will use a framework interceptor which automates propagation. Here's how they might work internally.
Here's what client-side propagation might look like
// configure a function that injects a trace context into a request
injector = tracing.propagation().injector(Request.Builder::addHeader);
// before a request is sent, add the current span's context to it
injector.inject(span.context(), request);
Here's what server-side propagation might look like
// configure a function that extracts the trace context from a request
extractor = tracing.propagation().extractor(Request::getHeader);
// when a server receives a request, it joins or starts a new trace
span = tracer.nextSpan(extractor.extract(request));
Sometimes you need to propagate extra fields, such as a request ID or an alternate trace context. For example, if you are in a Cloud Foundry environment, you might want to pass the request ID:
// when you initialize the builder, define the extra field you want to propagate
tracingBuilder.propagationFactory(
ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "x-vcap-request-id")
);
// later, you can tag that request ID or use it in log correlation
requestId = ExtraFieldPropagation.get("x-vcap-request-id");
Brave is an infrastructure library: you will create lock-in if you expose its apis into business code. Prefer exposing your own types for utility functions that use this class as this will insulate you from lock-in.
While it may seem convenient, do not use this for security context propagation as it was not designed for this use case. For example, anything placed in here can be accessed by any code in the same classloader!
You may also need to propagate an second trace context transparently. For example, when in an Amazon Web Services environment, but not reporting data to X-Ray. To ensure X-Ray can co-exist correctly, pass-through its tracing header like so.
tracingBuilder.propagationFactory(
ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "x-amzn-trace-id")
);
You can also prefix fields, if they follow a common pattern. For example, the following will propagate the field "x-vcap-request-id" as-is, but send the fields "country-code" and "user-id" on the wire as "baggage-country-code" and "baggage-user-id" respectively.
Setup your tracing instance with allowed fields:
tracingBuilder.propagationFactory(
ExtraFieldPropagation.newFactoryBuilder(B3Propagation.FACTORY)
.addField("x-vcap-request-id")
.addPrefixedFields("baggage-", Arrays.asList("country-code", "user-id"))
.build()
);
Later, you can call below to affect the country code of the current trace context
ExtraFieldPropagation.set("country-code", "FO");
String countryCode = ExtraFieldPropagation.get("country-code");
Or, if you have a reference to a trace context, use it explicitly
ExtraFieldPropagation.set(span.context(), "country-code", "FO");
String countryCode = ExtraFieldPropagation.get(span.context(), "country-code");
You may have some fields that you would like to propagate in-process, but not downstream to other
hosts. Use .addRedactedField()
to indicate a field which should not be injected into headers.
The TraceContext.Extractor<C>
reads trace identifiers and sampling status
from an incoming request or message. The carrier is usually a request object
or headers.
This utility is used in standard instrumentation like HttpServerHandler, but can also be used for custom RPC or messaging code.
TraceContextOrSamplingFlags
is usually only used with Tracer.nextSpan(extracted)
, unless you are
sharing span IDs between a client and a server.
A normal instrumentation pattern is creating a span representing the server
side of an RPC. Extractor.extract
might return a complete trace context when
applied to an incoming client request. Tracer.joinSpan
attempts to continue
the this trace, using the same span ID if supported, or creating a child span
if not. When span ID is shared, data reported includes a flag saying so.
Here's an example of B3 propagation:
┌───────────────────┐ ┌───────────────────┐
Incoming Headers │ TraceContext │ │ TraceContext │
┌───────────────────┐(extract)│ ┌───────────────┐ │(join)│ ┌───────────────┐ │
│ X─B3-TraceId │─────────┼─┼> TraceId │ │──────┼─┼> TraceId │ │
│ │ │ │ │ │ │ │ │ │
│ X─B3-ParentSpanId │─────────┼─┼> ParentSpanId │ │──────┼─┼> ParentSpanId │ │
│ │ │ │ │ │ │ │ │ │
│ X─B3-SpanId │─────────┼─┼> SpanId │ │──────┼─┼> SpanId │ │
└───────────────────┘ │ │ │ │ │ │ │ │
│ │ │ │ │ │ Shared: true │ │
│ └───────────────┘ │ │ └───────────────┘ │
└───────────────────┘ └───────────────────┘
Some propagation systems only forward the parent span ID, detected when
Propagation.Factory.supportsJoin() == false
. In this case, a new span ID is
always provisioned and the incoming context determines the parent ID.
Here's an example of AWS propagation:
┌───────────────────┐ ┌───────────────────┐
x-amzn-trace-id │ TraceContext │ │ TraceContext │
┌───────────────────┐(extract)│ ┌───────────────┐ │(join)│ ┌───────────────┐ │
│ Root │─────────┼─┼> TraceId │ │──────┼─┼> TraceId │ │
│ │ │ │ │ │ │ │ │ │
│ Parent │─────────┼─┼> SpanId │ │──────┼─┼> ParentSpanId │ │
└───────────────────┘ │ └───────────────┘ │ │ │ │ │
└───────────────────┘ │ │ SpanId: New │ │
│ └───────────────┘ │
└───────────────────┘
Note: Some span reporters do not support sharing span IDs. For example, if you
set Tracing.Builder.spanReporter(amazonXrayOrGoogleStackdrive)
, disable join
via Tracing.Builder.supportsJoin(false)
. This will force a new child span on
Tracer.joinSpan()
.
TraceContext.Extractor<C>
is implemented by a Propagation.Factory
plugin. Internally, this code
will create the union type TraceContextOrSamplingFlags
with one of the following:
TraceContext
if trace and span IDs were present.TraceIdContext
if a trace ID was present, but not span IDs.SamplingFlags
if no identifiers were present
Some Propagation
implementations carry extra data from point of extraction (ex reading incoming
headers) to injection (ex writing outgoing headers). For example, it might carry a request ID. When
implementations have extra data, here's how they handle it.
- If a
TraceContext
was extracted, add the extra data asTraceContext.extra()
- Otherwise, add it as
TraceContextOrSamplingFlags.extra()
, whichTracer.nextSpan
handles.
By default, data recorded before (Span.finish()
) are reported to Zipkin
via what's passed to Tracing.Builder.spanReporter
. FinishedSpanHandler
can modify or drop data before it goes to Zipkin. It can even intercept
data that is not sampled for Zipkin.
FinishedSpanHandler
can return false to drop spans that you never want
to see in Zipkin. This should be used carefully as the spans dropped
should not have children.
Here's an example of SQL COMMENT spans so they don't clutter Zipkin.
tracingBuilder.addFinishedSpanHandler(new FinishedSpanHandler() {
@Override public boolean handle(TraceContext context, MutableSpan span) {
return !"comment".equals(span.name());
}
});
Another example is redaction: you may need to scrub tags to ensure no personal information like social security numbers end up in Zipkin.
tracingBuilder.addFinishedSpanHandler(new FinishedSpanHandler() {
@Override public boolean handle(TraceContext context, MutableSpan span) {
span.forEachTag((key, value) ->
value.replaceAll("[0-9]{3}\\-[0-9]{2}\\-[0-9]{4}", "xxx-xx-xxxx")
);
return true; // retain the span
}
});
An example of redaction is here
While Brave defaults to report 100% of data to Zipkin, many will use a lower percentage like 1%. This is called sampling and the decision is maintained throughout the trace, across requests consistently. Sampling has disadvantages. For example, statistics on sampled data is usually misleading by nature of not observing all durations.
FinishedSpanHandler
returns alwaysSampleLocal()
to indicate whether
it should see all data, or just all data sent to Zipkin. You can override
this to true to observe all operations.
Here's an example of metrics handling:
tracingBuilder.addFinishedSpanHandler(new FinishedSpanHandler() {
@Override public boolean alwaysSampleLocal() {
return true; // since we want to always see timestamps, we have to always record
}
@Override public boolean handle(TraceContext context, MutableSpan span) {
if (namesToAlwaysTime.contains(span.name())) {
registry.timer("spans", "name", span.name())
.record(span.finishTimestamp() - span.startTimestamp(), MICROSECONDS);
}
return true; // retain the span
}
});
An example of metrics handling is here
Brave supports a "current tracing component" concept which should only be used when you have no other means to get a reference. This was made for JDBC connections, as they often initialize prior to the tracing component.
The most recent tracing component instantiated is available via
Tracing.current()
. There's also a shortcut to get only the tracer
via Tracing.currentTracer()
. If you use either of these methods, do
not cache the result. Instead, look them up each time you need them.
Brave supports a "current span" concept which represents the in-flight operation.
Tracer.currentSpanCustomizer()
never returns null and SpanCustomizer
is generally a safe object to expose to third-party code to add tags.
Tracer.currentSpan()
should be reserved for framework code that cannot
reference the span it wants to finish, flush or abandon explicitly.
Tracer.nextSpan()
uses the "current span" to determine a parent. This
creates a child of whatever is in-flight.
Many frameworks allow you to specify an executor which is used for user
callbacks. The type CurrentTraceContext
implements all functionality
needed to support the current span. It also exposes utilities which you
can use to decorate executors.
CurrentTraceContext currentTraceContext = ThreadLocalCurrentTraceContext.create();
tracing = Tracing.newBuilder()
.currentTraceContext(currentTraceContext)
...
.build();
Client c = Client.create();
c.setExecutorService(currentTraceContext.executorService(realExecutorService));
When writing new instrumentation, it is important to place a span you
created in scope as the current span. Not only does this allow users to
access it with Tracer.currentSpanCustomizer()
, but it also allows
customizations like SLF4J MDC to see the current trace IDs.
The easiest way to scope a span is via Tracer.startScopedSpan(name)
:
ScopedSpan span = tracer.startScopedSpan("encode");
try {
// The span is in "scope" meaning downstream code such as loggers can see trace IDs
return encoder.encode();
} catch (RuntimeException | Error e) {
span.error(e); // Unless you handle exceptions, you might not know the operation failed!
throw e;
} finally {
span.finish(); // always finish the span
}
If doing RPC or otherwise advanced tracing, Tracer.withSpanInScope(Span)
scopes an existing span via the try-with-resources idiom. Whenever
external code might be invoked (such as proceeding an interceptor or
otherwise), place the span in scope like this.
try (SpanInScope ws = tracer.withSpanInScope(span)) {
return inboundRequest.invoke();
} catch (RuntimeException | Error e) {
span.error(e);
throw e;
} finally { // note the scope is independent of the span
span.finish();
}
In edge cases, you may need to clear the current span temporarily. For
example, launching a task that should not be associated with the current
request. To do this, simply pass null to withSpanInScope
.
try (SpanInScope cleared = tracer.withSpanInScope(null)) {
startBackgroundThread();
}
Many libraries expose a callback model as opposed to an interceptor one.
When creating new instrumentation, you may find places where you need to
place a span in scope in one callback (like onStart()
) and end the scope
in another callback (like onFinish()
).
Provided the library guarantees these run on the same thread, you can
simply propagate the result of Tracer.withSpanInScope(Span)
from the
starting callback to the closing one. This is typically done with a
request-scoped attribute.
Here's an example:
class MyFilter extends Filter {
public void onStart(Request request, Attributes attributes) {
// Assume you have code to start the span and add relevant tags...
// We now set the span in scope so that any code between here and
// the end of the request can modify it with Tracer.currentSpan()
// or Tracer.currentSpanCustomizer()
SpanInScope spanInScope = tracer.withSpanInScope(span);
// We don't want to leak the scope, so we place it somewhere we can
// lookup later
attributes.put(SpanInScope.class, spanInScope);
}
public void onFinish(Response response, Attributes attributes) {
// as long as we are on the same thread, we can read the span started above
Span span = tracer.currentSpan();
// Assume you have code to complete the span
// We now remove the scope (which implicitly detaches it from the span)
attributes.remove(SpanInScope.class).close();
}
}
Sometimes you have to instrument a library where There's no attribute
namespace shared across request and response. For this scenario, you can
use ThreadLocalSpan
to temporarily store the span between callbacks.
Here's an example:
class MyFilter extends Filter {
final ThreadLocalSpan threadLocalSpan;
public void onStart(Request request) {
// Assume you have code to start the span and add relevant tags...
// We now set the span in scope so that any code between here and
// the end of the request can see it with Tracer.currentSpan()
threadLocalSpan.set(span);
}
public void onFinish(Response response, Attributes attributes) {
// as long as we are on the same thread, we can read the span started above
Span span = threadLocalSpan.remove();
if (span == null) return;
// Assume you have code to complete the span
}
}
The examples above work because the callbacks happen on the same thread.
You should not set a span in scope if you cannot close that scope on the
same thread. This may be the case in some asynchronous libraries. Often,
you will need to propagate the span directly in a custom attribute. This
will allow you to trace the RPC, even if this approach doesn't facilitate
use of Tracer.currentSpan()
from external code.
Here's an example of explicit propagation:
class MyFilter extends Filter {
public void onStart(Request request, Attributes attributes) {
// Assume you have code to start the span and add relevant tags...
// We can't open a scope as onFinish happens on another thread.
// Instead, we propagate the span manually so at least basic tracing
// will work.
attributes.put(Span.class, span);
}
public void onFinish(Response response, Attributes attributes) {
// We can't rely on Tracer.currentSpan(), but we can rely on explicit
// propagation
Span span = attributes.remove(Span.class);
// Assume you have code to complete the span
}
}
The CurrentTraceContext
makes a trace context visible, such that spans
aren't accidentally added to the wrong trace. It also accepts hooks like
log correlation.
For example, if you use Log4J 2, you can copy trace IDs to your logging context with our decorator:
tracing = Tracing.newBuilder()
.currentTraceContext(ThreadLocalCurrentTraceContext.newBuilder()
.addScopeDecorator(ThreadContextScopeDecorator.create())
.build()
)
...
.build();
Besides logging, other tools are available. StrictScopeDecorator can help find out when code is not closing scopes properly. This can be useful when writing or diagnosing custom instrumentation.
If you are in a situation where you need to turn off tracing at runtime,
invoke Tracing.setNoop(true)
. This will turn any new spans into "noop"
spans, and drop any data until Tracing.setNoop(false)
is invoked.
Brave has been built with performance in mind. Using the core Span api, you can record spans in sub-microseconds. When a span is sampled, there's effectively no overhead (as it is a noop).
Unlike previous implementations, Brave 4 only needs one timestamp per
span. All annotations are recorded on an offset basis, using the less
expensive and more precise System.nanoTime()
function.
Instrumentation problems can lead to scope leaks and orphaned data. When testing instrumentation, use StrictScopeDecorator, as it will throw errors on known scoping problems.
If you see data with the annotation brave.flush
, you may have an
instrumentation bug. To see more information, set the Java logger named
brave.internal.recorder.PendingSpans
to FINE level. Do not do this in
production as tracking abandoned data incurs higher overhead.
Note: When using log4j2, set the following to ensure log settings apply:
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
When writing unit tests, there are a few tricks that will make bugs easier to find:
- Report spans into a concurrent queue, so you can read them in tests
- Use
StrictScopeDecorator
to reveal subtle thread-related propagation bugs - Unconditionally cleanup
Tracing.current()
, to prevent leaks
Here's an example setup for your unit test fixture:
ConcurrentLinkedDeque<Span> spans = new ConcurrentLinkedDeque<>();
Tracing tracing = Tracing.newBuilder()
.currentTraceContext(ThreadLocalCurrentTraceContext.newBuilder()
.addScopeDecorator(StrictScopeDecorator.create())
.build()
)
.spanReporter(spans::add)
.build();
@After public void close() {
Tracing current = Tracing.current();
if (current != null) current.close();
}
Brave 4 was designed to live alongside Brave 3. Using TracerAdapter
,
you can navigate between apis, buying you time to update as appropriate.
Concepts are explained below, and there's an elaborate example of interop here.
Note: The last version of Brave 3 types is io.zipkin.brave:brave-core:4.13.4
TracerAdapter
also should work with later versions of io.zipkin.brave:brave
If your code uses Brave 3 apis, all you need to do is use TracerAdapter
to create a (Brave 3) .. Brave. You don't have to change anything else.
Tracing brave4 = Tracing.newBuilder()...build();
Brave brave3 = TracerAdapter.newBrave(brave4.tracer());
Those coding directly to both apis can use TracerAdapter.toSpan
to
navigate between span types. This is useful when working with client RPC
and in-process (local) spans.
If you have a reference to a Brave 3 Span, you can convert like this:
brave3Span = brave3.localSpanThreadBinder().getCurrentLocalSpan();
brave4Span = TracerAdapter.toSpan(brave4, brave3Span);
You can also attach Brave 4 Span to a Brave 3 thread binder like this:
brave3Span = TracerAdapter.toSpan(brave4Span.context());
brave3.localSpanThreadBinder().setCurrentSpan(brave3Span);
Brave 3's ServerTracer works slightly differently than Brave 3's client
or local tracers. Internally, it uses a ServerSpan
which keeps track
of the original sample status. To interop with ServerTracer, use the
following hooks:
Use TracerAdapter.getServerSpan
to get one of..
- a real span: when a sampled server request preceded this call
- a noop span: when a server request preceded this call, but was not sampled
- null: if no server request preceded this call
brave4Span = TracerAdapter.getServerSpan(brave4, brave3.serverSpanThreadBinder());
// Since the current server span might be empty, you must check null
if (brave4Span != null) {
// use or finish the span
}
If you have a reference to a Brave 4 span, and know it represents a server
request, use TracerAdapter.setServerSpan
to attach it to Brave 3's thread
binder.
TracerAdapter.setServerSpan(brave4Span.context(), brave3.serverSpanThreadBinder());
brave3.serverTracer().setServerSend(); // for example
io.zipkin.brave:brave
(Brave 4) is an optional dependency of
io.zipkin.brave:brave-core
(Brave 3). To use TracerAdapter
, you need
to declare an explicit dependency, and ensure both libraries are of the
same version.
- Never mutate
com.twitter.zipkin.gen.Span
directly. Doing so will break the above integration. - The above only works when
com.github.kristofa.brave.Brave
was instantiated viaTracerAdapter.newBrave
Traditionally, Zipkin trace IDs were 64-bit. Starting with Zipkin 1.14, 128-bit trace identifiers are supported. This can be useful in sites that have very large traffic volume, persist traces forever, or are re-using externally generated 128-bit IDs.
If you want Brave to generate 128-bit trace identifiers when starting new
root spans, set Tracing.Builder.traceId128Bit(true)
When 128-bit trace ids are propagated, they will be twice as long as
before. For example, the X-B3-TraceId
header will hold a 32-character
value like 163ac35c9f6413ad48485a3953bb6124
.
Before doing this, ensure your Zipkin setup is up-to-date, and downstream instrumented services can read the longer 128-bit trace IDs.
Note: this only affects the trace ID, not span IDs. For example, span ids within a trace are always 64-bit.
Brave 4's design lends from past experience and similar open source work. Quite a lot of decisions were driven by portability with Brave 3, and the dozens of individuals involved in that. There are notable new aspects that are borrowed or adapted from others.
Brave 4 allows you to pass around a Span object which can report itself to Zipkin when finished. This is better than using thread contexts in some cases, particularly where many async hops are in use. The Span api is derived from OpenTracing, narrowed to more cleanly match Zipkin's abstraction. As a result, a bridge from Brave 4 to OpenTracing v0.20.2 is relatively little code. It should be able to implement future versions of OpenTracing as well.
Brave 4.19 introduced ScopedSpan
which derives influence primarily from
OpenCensus SpanBuilder.startScopedSpan() Api, a convenience type to
start work and ensure scope is visible downstream. One main difference here is
we assume the scoped span is used in the same thread, so mark it as such. Also,
we reduce the size of the api to help secure users against code drift.
Brave 4.19 added ErrorParser
, internally used by Span.error(Throwable)
. This
design incubated in Spring Cloud Sleuth prior to becoming a primary type here.
Our care and docs around error handling in general is credit to Nike Wingtips.
Much of Brave 4's architecture is borrowed from Finagle, whose design implies a separation between the propagated trace context and the data collected in process. For example, Brave's MutableSpanMap is the same overall design as Finagle's. The internals of MutableSpanMap were adapted from WeakConcurrentMap.
OpenTracing's Tracer type has methods to inject or extract a trace context from a carrier. While naming is similar, Brave optimizes for direct integration with carrier types (such as http request) vs routing through an intermediate (such as a map). Brave also considers propagation a separate api from the tracer.
The first design work of Tracing.currentTracer()
started in Brave 3,
which was itself influenced by Finagle's implicit Tracer api. This feature
is noted as edge-case, when other means to get a reference to a trace are
impossible. The only instrumentation that needed this was JDBC.
Returning a possibly null reference from Tracing.currentTracer()
implies:
- Users never cache the reference returned (noted in javadocs)
- Less code and type constraints on Tracer vs a lazy forwarding delegate
- Less documentation as we don't have to explain what a Noop tracer does
The first design of CurrentTraceContext
was borrowed from ContextUtils
in Google's instrumentation-java project.
This was possible because of high collaboration between the projects and
an almost identical goal. Slight departures including naming (which prefers
Guice's naming conventions), and that the object scoped is a TraceContext
vs a Span.
Propagating a trace context instead of a span is a right fit for several reasons:
TraceContext
can be created without a reference to a tracer- Common tasks like making child spans and staining log context only need the context
- Brave's recorder is keyed on context, so there's no feature loss in this choice
The Census project has a concept of aSampledSpanStore. Typically, you configure a span name pattern or choose individual spans for local (in-process) storage. This storage is used to power administrative pages named zPages. For example, Tracez displays spans sampled locally which have errors or crossed a latency threshold.
The "sample local" vocab in Census was re-used in Brave to describe the act of keeping data that isn't going to necessarily end up in Zipkin.
The main difference in Brave is implementation. For example, the sampledLocal
flag in Brave is a part of the TraceContext and so can be triggered when
headers are parsed. This is needed to provide custom sampling mechanisms.
Also, Brave has a small core library and doesn't package any specific
storage abstraction, admin pages or latency processors. As such, we can't
define specifically what sampled local means as Census' could. All it
means is that FinishedSpanHandler
will see data for that trace context.
Brave had for a long time re-used zipkin's Reporter library, which is like Census' SpanExporter in so far that they both allow pushing a specific format elsewhere, usually to a service. Brave's FinishedSpanHandler is a little more like Census' SpanExporter.Handler in so far as the structure includes the trace context.. something we need access to in order to do things like advanced sampling.
Where it differs is that the MutableSpan
in Brave is.. well.. mutable,
and this allows you to cheaply change data. Also, it isn't a struct based
on a proto, so it can hold references to objects like Exception
. This
allows us to render data into different formats such as Amazon's stack frames
without having to guess what will be needed by parsing up front. As
error parsing is deferred, overhead is less in cases where errors are not
recorded (such as is the case on spans intentionally dropped).
Brave 4's public namespace is more defensive that the past, using a package accessor design from OkHttp.
RateLimitingSampler
was made to allow Amazon X-Ray rules to be
expressed in Brave. We considered their Reservoir design.
Our implementation differs as it removes a race condition and attempts
to be more fair by distributing accept decisions every decisecond.