Skip to content

RFC: Task Hook Inheritance #7306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Noah-Kennedy opened this issue May 5, 2025 · 10 comments
Open

RFC: Task Hook Inheritance #7306

Noah-Kennedy opened this issue May 5, 2025 · 10 comments
Assignees
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. C-musing Category: musings about a better world C-proposal Category: a proposal and request for comments M-runtime Module: tokio/runtime M-task Module: tokio/task

Comments

@Noah-Kennedy
Copy link
Contributor

Noah-Kennedy commented May 5, 2025

Problem

Task hooks are the ideal mechanism for users of Tokio to implement their own telemetry. As they allow users to instrument actions such as spawning, termination, and polling, they give users a mechanism to instrument both the lifecycle and running loop of individual tasks.

Unfortunately, the current API does not offer a great way to track state between hooks. We offer a struct which contains an id which could be used as an index into an out-of-band map, but that requires expensive synchronization which is wholly unnecessary - the operations we expose via hooks are already implicitly synchronized by the runtime. More importantly however, it is often required that a tree of tasks be tracked together.

For example, Cloudflare has existing instrumentation for our legacy proxy stack that allows us to track the time spent processing a given request by an event loop, which we would like to carry over to new Rust proxy instances we are spinning up. We can safely assume that the sub-tree of tasks stemming from a top-level task are related, so if we could track state somehow across a tree of tasks in hooks, we could implicitly instrument a whole sub-tree without making invasive modifications to every crate in the tree which spawn Tokio tasks so that we can pass along context information and centrally record which task ids are children of which parents.

Solution

#7197 overhauls task hooks to use a harness trait object which provides implementations of task hooks. The user-implemented type can store whatever state the user wants to store. Propagation of parent-to-child state is handled by a hook which allows parent task harnesses to return a hook harness for a child which is being spawned. So long as default implementations are provided, adding new hooks will not be a breaking change.

Alternatives

I attempted to construct a mechanism to allow users to pass through a set of hooks which shared a generic type for their state to the runtime builder which tokio would box so that we could keep the existing API, but that could not be made to work with hooks passed to new tasks as explicit overrides to any inheritance when spawning, and was a nightmare to deal with. Another alternative which were explored included passing Box<dyn Any> to hooks to allow them to store whatever they wanted.

Both of those also required adding a hook which would be invoked when a parent task spawned a child so that parent tasks could marshall data via an OOB mechanism the user would come up with which the child task's spawn hook would pull in. The harnesses system accomplish the same thing, works quite similarly under the hood, and is much simpler to use.

@Noah-Kennedy Noah-Kennedy self-assigned this May 5, 2025
@Noah-Kennedy Noah-Kennedy added M-runtime Module: tokio/runtime C-feature-request Category: A feature request. C-proposal Category: a proposal and request for comments A-tokio Area: The main tokio crate M-task Module: tokio/task C-musing Category: musings about a better world labels May 5, 2025
@jlizen
Copy link
Member

jlizen commented May 6, 2025

This seems very useful to me. Just earlier today, I was helping somebody figure out how to instrument a task to accomplish very similar goals.

Their specific use case was, they had a body they were returning in a hyper-based server handler, that contained a stream to data from a secondary backend service.

They wanted to know, what portion of e2e latency came from their backend service (IE the streaming body future), versus a slow client taking a long time to read from the socket, versus scheduling delay or other overhead. So the on_top_level_spawn, before_poll, after_poll, and on_task_terminate would all be handy for calculating metrics. Probably on_child_spawn too to let them differentiate a slow client from scheduling delay and other overhead - though probably that belongs more in Hyper as a turnkey thing.

Presumably the proposed API would also be useful for simplifying tokio_metrics::TaskMetrics. As well as implementing tokio-rs/tokio-metrics#75 (tasks-specific rather than aggregated metrics).

The main thing I DON'T see that might be nice, would be an on_wake hook. You need that to capture scheduled vs idle duration. Today tokio_metrics handles this via an ArcWake implementation. You could do that with these, too, so it's not the end of the world not to have it, but it might improve ergonomics.

@conradludgate
Copy link
Contributor

A few things came to mind regarding the current impl:

  1. on_child_spawn feels quite limiting. I can call spawn_with_hooks to set custom hooks but I can't have variable behaviour in on_child_spawn.
  2. While I can use spawn_with_hooks to configure the hooks for different tasks being spawned, I cannot access the current task hook.
  3. Is it possible to set a task hook for the block_on(..) task?
  4. Is it possible to use task hooks across runtimes - via Handle::spawn(..).
  5. Should there be hooks for spawn_blocking?
  6. Can the on_task_terminate method get more information about the termination status: cancelled/panic/completed.

Some changes I think I'd want as a result:


If using tokio::task::Builder, it would be nice to be able to specify task arguments. I'm not sure what format this would be in - perhaps just Box<dyn Any>, where the task hook must downcast. Additionally, getting the task name would also be nice.


If using spawn_with_hooks, it would be nice to borrow the current hook. Something like:

fn with_current_task_hook<F>(f: F)
where
    F: FnOnce(&mut dyn Any);

@LloydW93
Copy link

LloydW93 commented May 7, 2025

They wanted to know, what portion of e2e latency came from their backend service (IE the streaming body future), versus a slow client taking a long time to read from the socket, versus scheduling delay or other overhead.

That's exactly what we want to know too with the service we're building that led to us bothering @Noah-Kennedy :)

We currently have an event loop based service, which we are migrating away from, where we measure the timestamp at:

  1. The start of each cycle (equivalent to when tokio would consider a task awoken)
  2. When active work on the task starts in the process (equivalent to tokio starting to poll, i.e. at before_poll proposed here)
  3. When active work on the task stops in the process (equivalent to after_poll proposed here)
  4. When the main IO FDs have full write buffers or empty read buffers

and then calculate a bunch of durations from all that to get answers to the kinds of questions you proposed above.

The main thing I DON'T see that might be nice, would be an on_wake hook. You need that to capture scheduled vs idle duration.

When we were initially discussing this proposal, the idea was that we would also get, when each hook is invoked, a pointer with a bunch of useful timestamps, such as:

  1. More-or-less now (tokio has just checked the time, save on another clock_gettime call)
  2. When the task was woken/when the iodriver last checked for events
  3. The interval between the previous two io checks, so that it is possible to get the best and worst case delay in handling socket becoming readable/writable and the runtime noticing the task can be woken

and also potentially on after_poll a yield reason such as BUDGET_EXHAUSTED, IO, YIELD_NOW.

Those timestamps would remove the need for an additional on_wake hook, which from my understanding would often end up being executed as part of io driver work, which is a shared hot path across all the threads and so you'd want to avoid doing that.

@LloydW93
Copy link

LloydW93 commented May 7, 2025

2. While I can use spawn_with_hooks to configure the hooks for different tasks being spawned, I cannot access the current task hook.

The idea is that you would have something like an Arc<TaskPollCtx> that is stored on both your TaskHookHarness implementation and your task to be able to pass data between them.

6.Can the on_task_terminate method get more information about the termination status: cancelled/panic/completed.

that'd be a great addition as we iterate imo.

@conradludgate
Copy link
Contributor

and also potentially on after_poll a yield reason such as BUDGET_EXHAUSTED, IO, YIELD_NOW.

That reminds me, I brought up this hooks overhaul initiative a few months back with my colleages. One suggested that before_poll could influence the task budget. This would go beyond just hooks, and doesn't need to be in the first pass as the API is extensible, but would be interesting to explore how hooks could not just be for monitoring, but configuring task behaviour.

@carllerche
Copy link
Member

@conradludgate interesting, what use cases do you have for this?

@LloydW93
Copy link

LloydW93 commented May 9, 2025

how hooks could not just be for monitoring, but configuring task behaviour

Oh yes, I thought about this a while back too and completely forgot about it!

The use case I was thinking of is some fairly basic prioritisation logic, where you could group tasks into priority types and when you're under resource pressure just decide to not schedule a certain category of task at all, so before_poll could say "actually, no, don't wake again for another second", or similar.

@Darksonn
Copy link
Contributor

I think that we should consider whether the implementation is compatible with placing the task hook object directly inside the Tokio task, and not behind a Box<dyn TaskHook>. For something that goes on every task, that's probably important for performance.

@Noah-Kennedy
Copy link
Contributor Author

I think that we should consider whether the implementation is compatible with placing the task hook object directly inside the Tokio task, and not behind a Box<dyn TaskHook>. For something that goes on every task, that's probably important for performance.

That's definitely better from a performance standpoint.

From an API standpoint, what what an API look like which supports this?

@Noah-Kennedy
Copy link
Contributor Author

Also, does this block the merge of the initial PR?

I don't think it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. C-musing Category: musings about a better world C-proposal Category: a proposal and request for comments M-runtime Module: tokio/runtime M-task Module: tokio/task
Projects
None yet
Development

No branches or pull requests

6 participants