-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(telemetry): cross-component async write tracing #12405
feat(telemetry): cross-component async write tracing #12405
Conversation
69db86b
to
18495bb
Compare
18495bb
to
2589dfe
Compare
2589dfe
to
bc50082
Compare
* created TraceContext for opentelemetry spans * added tracing header/cookies to control logging trace info * support legacy dropwizard tracing using opentelemetry * added smoke-tests for tracing conditions
bc50082
to
d101d78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of changes here so there was some quick scanning in review, but everything was looking good from what I saw. If there's any specific parts you were looking for more feedback on let me know and we can go through it, but very excited to get this in!
Probably worth a demo to at least platform team (maybe recorded for CS) on how to utilize it. Very cool stuff and nice work!
metadata-io/src/main/java/com/linkedin/metadata/service/UpdateIndicesService.java
Outdated
Show resolved
Hide resolved
metadata-io/src/main/java/com/linkedin/metadata/trace/KafkaTraceReader.java
Outdated
Show resolved
Hide resolved
// Process records for each aspect name we haven't found yet | ||
for (String aspectName : aspectNames) { | ||
if (!results.containsKey(aspectName)) { | ||
var matchingRecord = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally avoid var in the rest of the codebase so a bit weird to see it here, but I think it's fine since we don't need to run metadata-io in Java 11/8 compat
...mce-consumer/src/main/java/com/linkedin/metadata/kafka/MetadataChangeProposalsProcessor.java
Show resolved
Hide resolved
Failing cypress is not reproduce-able locally. |
412600a
into
datahub-project:master
Key Features:
Async write requests (OpenAPI/Rest.li) will include a
trace id
generated by OpenTelemetry and returned via a standard http response header used for tracing:traceparent
.SystemMetadata
, available from OpenAPIv3, will also include thetrace id
in the properties with keytelemetryTraceId
trace id
can be used to track the outcome from a write request with information about its success/failure or pending status.trace id
from GMS to the consumers (mce-consumer
andmae-consumer
).00062c2b698bcb28e92508f8f311802d
telemetryTraceId
value in the response.trace id
and the URN/aspect.The Failed MCP topic will now store more detailed error messages and the trace API will fetch these errors in order to not only return failure status, but detailed information on why it failed.
For debugging, a cookie or special header, can be used with any request (read/write/sync/async) using any API (Graphql/OpenAPI/Rest.li) and will trigger logging of the spans with detailed timing of the request in the logs.
X-Enable-Trace-Log
with valuetrue
enable-trace-log
with valuetrue
Design Considerations:
For the initial implementation no specific telemetry infrastructure is required, however existing environment variables for OpenTelemetry can continue to be used and will export the new spans if configured.
trace id
or related timestamps.trace id
is stored insystemMetadata
in both SQL and ES. For ES specifically, the presence of thetrace id
in the system metadata index is used as a proxy to determine a successful write to ES.Trace performance
skipCache
is included as a flag to bypass the cache.This PR updates OpenTelemetry and transitions from the DropWizard based timing instrumentation to using OpenTelemetry. The existing metrics for DropWizard are forwarded from OpenTelemetry preserving the existing naming scheme.
For easy access, the
OperationContext
now includes aTraceContext
to facilitate the integration of OpenTelemetry into any part of the code base.Checklist