Support application traces #67

jordap · 2024-07-26T23:32:17Z

Overview

This is a rewrite of user annotations that attempts to address some of the issues with the previous file/json approach. At a high level, this PR introduces the following changes:

Add a new "trace" module to trace application regions/spans.
Change the communication between application and the Omnistat collector, from a regular json file to a zeromq socket. The collector sets up a socket to receive messages, and the application-level tracer simply sends (non-blocking) start/end messages to that socket. There is a single socket per collector instance, and multiple processes can send trace messages to the same socket (preferably with different trace IDs to differentiate them).
The collector keeps track of the state of the trace.
Adapted the old annotation script to use the new trace module.

Open Questions

There are still a number of details that need further consideration.

Annotations vs Traces

Do we need annotations and traces? Or are traces enough? We can generate annotations from traces, but we need to find the right mapping from traces to annotations (some ideas are discussed below as part of other topics).

Unstructured vs Hierarchical

Do we want to enforce any structure in our traces to make it easier to display results?

At the moment spans are totally independent, no particular structure is enforced, and they can overlap (so we can potentially trace async operations, althought that's probably not our main goal). That said, it's still possible to display hierarchies by re-using the same markers/labels. For example, the following trace is displayed hierarchically by reusing the "EPOCH" marker for two different spans:

Unstructured is more flexible but also harder to visualize. The main motivation to enforce a particular structure would be to make visualization easier.

Flat vs Multi-level Traces

It's very challenging to display complex traces in system-wide mode because post-processing is very limited. For example, it's hard to sort the different spans in the trace, particularly when there's a hierarchy and/or multiple processes/ranks.

One option to address this issue in system-wide visualizations is to flatten the traces so we only display the most recent (or top) marker instead of the entire hierarchy. For example, we can display a flatter trace as follows:

Even if we simplify system-wide traces, we can keep the more complex/advanced visualizations for user-mode Grafana and/or reports.

Limit the number of trace messages per sample

At the moment the collector will attempt to receive all messages in the queue in every sample. This can be easily abused. I'd like to limit the number of messages that the collector will attempt to read per sample. It probably makes sense to make this number configurable.

We can limit by time or by number of messages. For reference, the first pull takes approximately ~40-50us, and subsequent pulls in the same iteration/sample take ~4us. In worst case scenarios, that can translate to ~0.55ms for 100 messages, or ~5.05ms for 1000 messages.

It probably makes sense to allow at least 128-256 messages (or 1 message per core in a decent CPU).

Configuration for user mode

The current approach will only send trace messages to a single Omnistat collector instance. That works fine if we are only working in either system mode or user mode. If we are using Omnistat user mode in a cluster with Omnistat system mode, we likely want to traces reported to the user mode instance. (We could try a PUB/SUB approach to communicate with multiple Omnistat instances, but I don't think it's worth the additional complexity for this particular use case).

This raises the question of how to configure the socket path in the application. I don't think it makes sense to make applications aware of the Omnistat configuration file. I'm thinking of making this configuration available to applications as an environment variable, e.g. OMNISTAT_TRACE_PATH.

Tasks

Support for fine-grained multi-rank traces
Split tracing from rms collector.
Merge dashboard.
Support for flat traces.

jordap · 2024-07-30T06:22:58Z

@koomie pyzmq is the only additional package for this PR. Installation takes approximately ~900KB and doesn't pull any extra dependencies (other than itself).

Signed-off-by: Jordà Polo <[email protected]>

jordap requested a review from koomie July 26, 2024 23:32

jordap added the enhancement New feature or request label Jul 30, 2024

jordap force-pushed the jorda/tracing branch 2 times, most recently from d23e0da to df2c710 Compare August 2, 2024 21:15

jordap force-pushed the jorda/tracing branch 3 times, most recently from eae722b to b4f2460 Compare September 17, 2024 18:00

jordap self-assigned this Oct 2, 2024

jordap added 6 commits October 15, 2024 09:55

Rewrite annotations using zmq to communicate with the collector

6f57f70

Signed-off-by: Jordà Polo <[email protected]>

Add pyzmq to dependencies

ee45100

Signed-off-by: Jordà Polo <[email protected]>

Add application trace panel to usermode dashboard

a0e3b53

Signed-off-by: Jordà Polo <[email protected]>

Update annotation utility to use new tracing mechanism

6e1b70a

Signed-off-by: Jordà Polo <[email protected]>

Fix format

d9ad146

Signed-off-by: Jordà Polo <[email protected]>

Fix call to method

b148ae7

Signed-off-by: Jordà Polo <[email protected]>

jordap force-pushed the jorda/tracing branch from b4f2460 to b148ae7 Compare October 15, 2024 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support application traces #67

Support application traces #67

jordap commented Jul 26, 2024 •

edited

Loading

jordap commented Jul 30, 2024

Support application traces #67

Are you sure you want to change the base?

Support application traces #67

Conversation

jordap commented Jul 26, 2024 • edited Loading

Overview

Open Questions

Annotations vs Traces

Unstructured vs Hierarchical

Flat vs Multi-level Traces

Limit the number of trace messages per sample

Configuration for user mode

Tasks

jordap commented Jul 30, 2024

jordap commented Jul 26, 2024 •

edited

Loading