RFC: Task Hook Inheritance #7306

Noah-Kennedy · 2025-05-05T16:56:33Z

Problem

Task hooks are the ideal mechanism for users of Tokio to implement their own telemetry. As they allow users to instrument actions such as spawning, termination, and polling, they give users a mechanism to instrument both the lifecycle and running loop of individual tasks.

Unfortunately, the current API does not offer a great way to track state between hooks. We offer a struct which contains an id which could be used as an index into an out-of-band map, but that requires expensive synchronization which is wholly unnecessary - the operations we expose via hooks are already implicitly synchronized by the runtime. More importantly however, it is often required that a tree of tasks be tracked together.

For example, Cloudflare has existing instrumentation for our legacy proxy stack that allows us to track the time spent processing a given request by an event loop, which we would like to carry over to new Rust proxy instances we are spinning up. We can safely assume that the sub-tree of tasks stemming from a top-level task are related, so if we could track state somehow across a tree of tasks in hooks, we could implicitly instrument a whole sub-tree without making invasive modifications to every crate in the tree which spawn Tokio tasks so that we can pass along context information and centrally record which task ids are children of which parents.

Solution

#7197 overhauls task hooks to use a harness trait object which provides implementations of task hooks. The user-implemented type can store whatever state the user wants to store. Propagation of parent-to-child state is handled by a hook which allows parent task harnesses to return a hook harness for a child which is being spawned. So long as default implementations are provided, adding new hooks will not be a breaking change.

Alternatives

I attempted to construct a mechanism to allow users to pass through a set of hooks which shared a generic type for their state to the runtime builder which tokio would box so that we could keep the existing API, but that could not be made to work with hooks passed to new tasks as explicit overrides to any inheritance when spawning, and was a nightmare to deal with. Another alternative which were explored included passing Box<dyn Any> to hooks to allow them to store whatever they wanted.

Both of those also required adding a hook which would be invoked when a parent task spawned a child so that parent tasks could marshall data via an OOB mechanism the user would come up with which the child task's spawn hook would pull in. The harnesses system accomplish the same thing, works quite similarly under the hood, and is much simpler to use.

The text was updated successfully, but these errors were encountered:

jlizen · 2025-05-06T17:30:42Z

This seems very useful to me. Just earlier today, I was helping somebody figure out how to instrument a task to accomplish very similar goals.

Their specific use case was, they had a body they were returning in a hyper-based server handler, that contained a stream to data from a secondary backend service.

They wanted to know, what portion of e2e latency came from their backend service (IE the streaming body future), versus a slow client taking a long time to read from the socket, versus scheduling delay or other overhead. So the on_top_level_spawn, before_poll, after_poll, and on_task_terminate would all be handy for calculating metrics. Probably on_child_spawn too to let them differentiate a slow client from scheduling delay and other overhead - though probably that belongs more in Hyper as a turnkey thing.

Presumably the proposed API would also be useful for simplifying tokio_metrics::TaskMetrics. As well as implementing tokio-rs/tokio-metrics#75 (tasks-specific rather than aggregated metrics).

The main thing I DON'T see that might be nice, would be an on_wake hook. You need that to capture scheduled vs idle duration. Today tokio_metrics handles this via an ArcWake implementation. You could do that with these, too, so it's not the end of the world not to have it, but it might improve ergonomics.

conradludgate · 2025-05-07T13:52:52Z

A few things came to mind regarding the current impl:

on_child_spawn feels quite limiting. I can call spawn_with_hooks to set custom hooks but I can't have variable behaviour in on_child_spawn.
While I can use spawn_with_hooks to configure the hooks for different tasks being spawned, I cannot access the current task hook.
Is it possible to set a task hook for the block_on(..) task?
Is it possible to use task hooks across runtimes - via Handle::spawn(..).
Should there be hooks for spawn_blocking?
Can the on_task_terminate method get more information about the termination status: cancelled/panic/completed.

Some changes I think I'd want as a result:

If using tokio::task::Builder, it would be nice to be able to specify task arguments. I'm not sure what format this would be in - perhaps just Box<dyn Any>, where the task hook must downcast. Additionally, getting the task name would also be nice.

If using spawn_with_hooks, it would be nice to borrow the current hook. Something like:

fn with_current_task_hook<F>(f: F)
where
    F: FnOnce(&mut dyn Any);

LloydW93 · 2025-05-07T15:46:46Z

They wanted to know, what portion of e2e latency came from their backend service (IE the streaming body future), versus a slow client taking a long time to read from the socket, versus scheduling delay or other overhead.

That's exactly what we want to know too with the service we're building that led to us bothering @Noah-Kennedy :)

We currently have an event loop based service, which we are migrating away from, where we measure the timestamp at:

The start of each cycle (equivalent to when tokio would consider a task awoken)
When active work on the task starts in the process (equivalent to tokio starting to poll, i.e. at before_poll proposed here)
When active work on the task stops in the process (equivalent to after_poll proposed here)
When the main IO FDs have full write buffers or empty read buffers

and then calculate a bunch of durations from all that to get answers to the kinds of questions you proposed above.

The main thing I DON'T see that might be nice, would be an on_wake hook. You need that to capture scheduled vs idle duration.

When we were initially discussing this proposal, the idea was that we would also get, when each hook is invoked, a pointer with a bunch of useful timestamps, such as:

More-or-less now (tokio has just checked the time, save on another clock_gettime call)
When the task was woken/when the iodriver last checked for events
The interval between the previous two io checks, so that it is possible to get the best and worst case delay in handling socket becoming readable/writable and the runtime noticing the task can be woken

and also potentially on after_poll a yield reason such as BUDGET_EXHAUSTED, IO, YIELD_NOW.

Those timestamps would remove the need for an additional on_wake hook, which from my understanding would often end up being executed as part of io driver work, which is a shared hot path across all the threads and so you'd want to avoid doing that.

LloydW93 · 2025-05-07T15:55:29Z

2. While I can use spawn_with_hooks to configure the hooks for different tasks being spawned, I cannot access the current task hook.

The idea is that you would have something like an Arc<TaskPollCtx> that is stored on both your TaskHookHarness implementation and your task to be able to pass data between them.

6.Can the on_task_terminate method get more information about the termination status: cancelled/panic/completed.

that'd be a great addition as we iterate imo.

conradludgate · 2025-05-07T16:04:36Z

and also potentially on after_poll a yield reason such as BUDGET_EXHAUSTED, IO, YIELD_NOW.

That reminds me, I brought up this hooks overhaul initiative a few months back with my colleages. One suggested that before_poll could influence the task budget. This would go beyond just hooks, and doesn't need to be in the first pass as the API is extensible, but would be interesting to explore how hooks could not just be for monitoring, but configuring task behaviour.

carllerche · 2025-05-07T17:38:36Z

@conradludgate interesting, what use cases do you have for this?

LloydW93 · 2025-05-09T09:26:30Z

how hooks could not just be for monitoring, but configuring task behaviour

Oh yes, I thought about this a while back too and completely forgot about it!

The use case I was thinking of is some fairly basic prioritisation logic, where you could group tasks into priority types and when you're under resource pressure just decide to not schedule a certain category of task at all, so before_poll could say "actually, no, don't wake again for another second", or similar.

Darksonn · 2025-05-19T22:09:24Z

I think that we should consider whether the implementation is compatible with placing the task hook object directly inside the Tokio task, and not behind a Box<dyn TaskHook>. For something that goes on every task, that's probably important for performance.

Noah-Kennedy · 2025-05-20T19:38:00Z

I think that we should consider whether the implementation is compatible with placing the task hook object directly inside the Tokio task, and not behind a Box<dyn TaskHook>. For something that goes on every task, that's probably important for performance.

That's definitely better from a performance standpoint.

From an API standpoint, what what an API look like which supports this?

Noah-Kennedy · 2025-05-20T19:38:27Z

Also, does this block the merge of the initial PR?

I don't think it should.

Noah-Kennedy mentioned this issue May 5, 2025

rt: overhaul task hooks #7197

Open

Noah-Kennedy self-assigned this May 5, 2025

Noah-Kennedy added M-runtime Module: tokio/runtime C-feature-request Category: A feature request. C-proposal Category: a proposal and request for comments A-tokio Area: The main tokio crate M-task Module: tokio/task C-musing Category: musings about a better world labels May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC: Task Hook Inheritance #7306

RFC: Task Hook Inheritance #7306

Noah-Kennedy commented May 5, 2025 •

edited

Loading

jlizen commented May 6, 2025 •

edited

Loading

Uh oh!

conradludgate commented May 7, 2025

Uh oh!

LloydW93 commented May 7, 2025 •

edited

Loading

Uh oh!

LloydW93 commented May 7, 2025

Uh oh!

conradludgate commented May 7, 2025

Uh oh!

carllerche commented May 7, 2025

Uh oh!

LloydW93 commented May 9, 2025

Uh oh!

Darksonn commented May 19, 2025

Uh oh!

Noah-Kennedy commented May 20, 2025

Uh oh!

Noah-Kennedy commented May 20, 2025

Uh oh!

Uh oh!

RFC: Task Hook Inheritance #7306

RFC: Task Hook Inheritance #7306

Comments

Noah-Kennedy commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Alternatives

jlizen commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

conradludgate commented May 7, 2025

Uh oh!

LloydW93 commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LloydW93 commented May 7, 2025

Uh oh!

conradludgate commented May 7, 2025

Uh oh!

carllerche commented May 7, 2025

Uh oh!

LloydW93 commented May 9, 2025

Uh oh!

Darksonn commented May 19, 2025

Uh oh!

Noah-Kennedy commented May 20, 2025

Uh oh!

Noah-Kennedy commented May 20, 2025

Uh oh!

Noah-Kennedy commented May 5, 2025 •

edited

Loading

jlizen commented May 6, 2025 •

edited

Loading

LloydW93 commented May 7, 2025 •

edited

Loading