Skip to content

Improve use of tracing spans in query path #25911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hiltontj opened this issue Jan 24, 2025 · 15 comments
Open

Improve use of tracing spans in query path #25911

hiltontj opened this issue Jan 24, 2025 · 15 comments
Assignees
Labels

Comments

@hiltontj
Copy link
Contributor

The purpose of this issue is to get a better understanding of how influxdb3 uses trace spans in the query path.

To better help diagnose and troubleshoot latency and performance issues in the query path, we may need to better leverage tracing spans in the query path, as what we have seen in results from the perf team, the information we are generating is sparse.

Objectives of this issue

  1. analyze generated trace spans during query execution to see what information is being generated
  2. determine which is being generated from iox_query and which is generated from this codebase
  3. determine if there are any obvious gaps in that information
  4. propose follow-up work
@hiltontj hiltontj added the v3 label Jan 24, 2025
@MaduMitha-Ravi
Copy link

@hiltontj Here are the trace spans that are collected for IOx/InfluxDB 3 projects. These trace spans are customizable and they can be added/removed/updated/renamed per need. The IOx team at times for their works/projects create new or sub spans that can be analyzed for performance.

Note: Below is the exhaustive list, and most of these traces are not captured.

  • IOx querier (ms)
  • execute_stream_partitioned (ms)
  • querier_table (ms)
  • sql_to_logical_plan (ms)
  • IOx ingester (ms)
  • table_snapshot (ms)
  • partition_snapshot (ms)
  • initial_request (ms)
  • stream_data_from_ingester (ms)
  • querier_flight_request_to_one_ingester (ms)
  • querier_table_chunks (ms)
  • querier_ingesters (ms)
  • prune_partitions (ms)
  • get_namespace (ms)
  • flight_planner (ms)
  • planner (ms)
  • get_metadata (ms)
  • loader (ms)
  • load_page_index (ms)
  • load_metadata (ms)
  • cache_metadata (ms)
  • prune_files (ms)
  • ParquetExec (ms)
  • sql_to_physical_plan (ms)
  • query_planning (ms)
  • acquire (ms)
  • permit (ms)
  • create_physical_plan (ms)

@hiltontj
Copy link
Contributor Author

Note: Below is the exhaustive list, and most of these traces are not captured.

Thanks @MaduMitha-Ravi - just to be clear, by "most of these traces are not captured", do you mean that monolith is not capturing them, while IOx is capturing them?

...or do you mean that they are not captured in general by both monolith and IOx?

@MaduMitha-Ravi
Copy link

MaduMitha-Ravi commented Jan 27, 2025

@hiltontj Yes, most of those trace spans are captured in IOx but not with Monolith. I will try to get more details today.

@hiltontj
Copy link
Contributor Author

Got it - that is very helpful. I can look into how IOx uses those spans and see how we can replicate. I think that would be a good starting point to help surface useful information in the tracing for your team.

Beyond that, we can start adding monolith-specific traces.

@MaduMitha-Ravi
Copy link

Here is the sample from IOx from a latest run that returns data,

Measurement Query Name IOx querier (ms) execute_stream_partitioned (ms) querier_table (ms) sql_to_logical_plan (ms) IOx ingester (ms) table_snapshot (ms) partition_snapshot (ms) initial_request (ms) stream_data_from_ingester (ms) querier_flight_request_to_one_ingester (ms) querier_table_chunks (ms) querier_ingesters (ms) prune_partitions (ms) get_namespace (ms) flight_planner (ms) ParquetExec (ms) sql_to_physical_plan (ms) query_planning (ms) acquire (ms) permit (ms) create_physical_plan (ms)
AVG s-q1 85.183 60.245 1.058 1.414 7.559 1.011 0.790 5.252 11.278 16.584 18.905 16.640 0.101 1.407 21.808 58.870 21.953 82.721 0.005 60.197 20.454

@MaduMitha-Ravi
Copy link

@hiltontj I do understand you have so many things that needs your focus, but wondering if there is an update on the Traces.

@waynr
Copy link
Contributor

waynr commented Feb 14, 2025

@pauldix I've got to the point where I roughly understand how tracing spans work in IOx & monolith and I've got a setup where I can send traces to jaeger in OSS (see #26006) but with respect to your comment in Slack:

Mostly, we need to know from the traces when queries are pulling Parquet files from cache or from object store. Then there's probably a bunch of other stuff, but I think that's the most important.

I think this might be tough to do if you want the cache/object store spans to inherit the parent span of the calling context since all object store access and caching is mediated through the ObjectStore trait, for example:

The reason this makes things difficult is that we need to be able to pass either a Span (https://github.com/influxdata/influxdb_iox/blob/7419ce7dc877d9073344e6c77221780142753322/trace/src/span.rs#L54) or a SpanContext (https://github.com/influxdata/influxdb_iox/blob/7419ce7dc877d9073344e6c77221780142753322/trace/src/ctx.rs#L44) from the caller but the ObjectStore trait methods are more or less fixed as they are (ie they don't take a Span or SpanContext).

I looked through the memory cache implementation in IOx and saw a similar lack of trace span reporting. Instead they seem to rely simply on reporting cache hit/miss/error metrics:

There are two approaches i can think of to work around this:

Separate Span Hierarchy for cached ObjectStore implementation

With this approach we would create a separate span hierarchy just for object store / cache accesses when initializing the memory cache object store. The main problem with this is that when we are troubleshooting a problematic query span there will be nothing (as far as i can tell) that would tie the objectstore/cache access to the problematic query.

Use object_store::Attribute passed to ObjectStore.*_opt methods

With this approach, we would initialize the MemCachedObjectStore with a telemetry collector, then use that and a parent span ID passed in to the ObjectStore.*_opt methods (eg get_opt) using a Attribute::Metadata. This would require the following work:

  • Replace all relevant ObjectStore method usage through in the query call stack with the *_opt variants
    • I'm guessing this would mostly be get_opt, get_range, and get_ranges methods
  • Update those methods on MemCachedObjectStore to actually try hitting the cache and to also construct a SpanContext using the MemCachedObjectStore's trace collector and a parent span ID obtained from Attribute::Metadata and build a span reporter from that.

I'll tentatively start with the second approach to see how feasible it is but keep an eye on this issue for feedback.

@pauldix
Copy link
Member

pauldix commented Feb 14, 2025

Any chance we can update the object store trait to take an optional span? Seems like that's what should be happening. Tracking metrics seems less than ideal if we can actually get tracing on it.

@waynr
Copy link
Contributor

waynr commented Feb 14, 2025

Any chance we can update the object store trait to take an optional span?

Maybe if we made the changes purely additive (ie new methods on the trait with default no-op/error implementations to avoid forcing other implementers of the trait to make supporting changes), but even if we could get buy-in from maintainers (many of whom work at InfluxData as well, so that could happen) there is the problem that the tracing spans we are talking about are very bespoke to InfluxData projects: https://github.com/influxdata/influxdb3_core/tree/main/trace

I think this is a problem because the object_store crate where the trait is defined is relatively independent from any other InfluxData crates. We might have to do something with the trace, trace_http, and trace_exporters crates like relocate them to be maintained alongside the object store repo: https://github.com/apache/arrow-rs

I think the simplest approach is to use the *_opt methods like I described in my previous comment -- I'll try to get a PR up demonstrating that soon.

@waynr
Copy link
Contributor

waynr commented Feb 14, 2025

So I just realized it's likely that we're passing these Arc<dyn ObjectStore> instances around to other libraries that we don't directly control in this repo -- I see that happening when initializing the parquet_file::ParquetStorage that's passed to the iox_query::exec::Executor constructor. Because of this, I don't think it will be as simple as replacing calls to ObjectStore.get with ObjectStore.get_opts like I described.

Metrics might be the best thing we can get without a major refactor spanning several repos.

@waynr
Copy link
Contributor

waynr commented Feb 14, 2025

Okay I spent some time looking around IOx code to try to understand how spans like prune_partitions are recorded under the hood of a trait object -- in that case it's an impl of TableProvider. I was looking at this to try to understand whether that particular span has any analogs here in monolith and realized it is using a type system trick to store arbitrary data (like Spans) from the calling context.

Specifically, the more interesting TableProvider methods (like scan) take a Session trait object which provides access to a SessionConfig instance that have support for custom configuration. This custom configuration allows the calling context of TableProvider trait objects to pass in arbitrary 'static lifetime values that can be retrieve internally by the implementation.

A similar approach could work for the object_store::ObjectStore trait. We would have to do roughly the following:

  • introduce a new type, object_store::ObjectStoreSession that has a dynamically-typed map of type IDs to Arc<dyn Any + Send + Sync + 'static> that we would downcast to a concrete type during retrieval
  • introduce a new set of optional methods like get_with_session to the ObjectStore trait
    • these methods would be optional by virtue of default implementations that ignore the session and delegate to the corresponding required methods
  • refactor code in https://github.com/influxdata/influxdb3_core to prefer the get_with_session wherever the calling context has a span that can be passed in
  • implement get_with_session in MemCachedObjectStore to record different spans for each branch of the code (ie retrieving from memory cache vs object store)

@waynr
Copy link
Contributor

waynr commented Feb 14, 2025

I think the simplest approach is to use the *_opt methods like I described in my previous comment -- I'll try to get a PR up demonstrating that soon.

By the way, I realized the *_opt approach wouldn't work in this case because https://docs.rs/object_store/latest/object_store/struct.GetOptions.html doesn't actually accept the same kind of arbitrary user metadata as https://docs.rs/object_store/latest/object_store/struct.PutOptions.html

@MaduMitha-Ravi
Copy link

@waynr Do we have an update on the Trace spans work?

@waynr
Copy link
Contributor

waynr commented Mar 3, 2025

@MaduMitha-Ravi I just merged this PR which adds tracing spans to the compactor: https://github.com/influxdata/influxdb_pro/pull/538

The object_store crate also has a new extensions feature that will let us add properly-parented spans to queries that access the object store -- this will let us discern whether a query involves object store accesses over the network or the in-memory parquet file cache is used instead.

@waynr
Copy link
Contributor

waynr commented Mar 3, 2025

To be clear with respect to the object_store work, I still need to update a datafusion PR to pass on extensions from the session config to the object store API calls that it makes (see apache/datafusion#14804)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants