rt: overhaul task hooks #7197

Noah-Kennedy · 2025-03-06T00:59:46Z

This change overhauls the entire task hooks system so that users can propagate arbitrary information between task hook invocations and pass context data between the hook "harnesses" for parent and child tasks at time of spawn.

This is intended to be significantly more extensible and long-term maintainable than the current task hooks system, and should ultimately be much easier to stabilize.

See #7306 for motivation.

Noah-Kennedy · 2025-03-06T01:20:55Z

I'll fix CI tomorrow.

tokio/src/runtime/task_hooks/mod.rs

rcoh · 2025-03-06T11:22:26Z

Just reading the API, I really like this. I think its an API that allows us to expand in the future (especially if there isn't a huge perf hit by virtue of open-ended XYZContext objects).

I might consider keeping the current hook APIs (but implementing them behind the scenes with the new API). I think, ultimately, they may still be simpler for some use cases.

Noah-Kennedy · 2025-03-06T20:34:56Z

Just reading the API, I really like this. I think its an API that allows us to expand in the future (especially if there isn't a huge perf hit by virtue of open-ended XYZContext objects).

I might consider keeping the current hook APIs (but implementing them behind the scenes with the new API). I think, ultimately, they may still be simpler for some use cases.

Thank you for the feedback! Expandability was one of my main priorities with this changeset, and I'm delighted to hear that others are also excited about this.

Regarding keeping both sets of hooks around, I don't think this is a great idea due to the complexity of the systems involved, and I honestly just wanna get rid of the old APIs as they were never very good to begin with. That being said, I don't think it will be much work for folks to migrate over, and my hope is that we can leave this API completely unchanged even while we let it bake prior to stabilization, so hopefully no more migrations will be needed.

Noah-Kennedy · 2025-03-06T20:36:39Z

I'm running into some issues with loom, and I'm currently unsure if this is because of an actual issue or just false positives.

@Darksonn do you have any idea what might be causing the current failures with the multi-threaded scheduler?

mox692 · 2025-03-07T04:53:30Z

Hmm, it looks like the loom failure log is not showing up on ci? ... I captured this locally anyway:

logs

running 1 test
test runtime::tests::loom_multi_thread::group_c::pool_shutdown has been running for over 60 seconds

thread 'runtime::tests::loom_multi_thread::group_c::pool_shutdown' panicked at /home/runner/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/location.rs:115:9:
Causality violation: Concurrent write accesses to `UnsafeCell`.

stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: loom::rt::location::PanicBuilder::fire
   3: loom::rt::cell::State::track_write
   4: scoped_tls::ScopedKey<T>::with
   5: loom::rt::cell::Cell::start_write
   6: loom::cell::unsafe_cell::UnsafeCell<T>::with_mut
   7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   8: tokio::runtime::scheduler::multi_thread::worker::Context::run
   9: tokio::runtime::context::scoped::Scoped<T>::set
  10: loom::thread::LocalKey<T>::try_with
  11: tokio::runtime::context::runtime::enter_runtime
  12: tokio::runtime::scheduler::multi_thread::worker::run
  13: loom::cell::unsafe_cell::UnsafeCell<T>::with_mut
  14: tokio::runtime::task::core::Core<T,S>::poll
  15: tokio::runtime::task::harness::Harness<T,S>::poll
  16: loom::cell::unsafe_cell::UnsafeCell<T>::with_mut
  17: tokio::runtime::task::UnownedTask<S>::run
  18: tokio::runtime::blocking::pool::Inner::run
  19: core::ops::function::FnOnce::call_once{{vtable.shim}}
  20: generator::stack::StackBox<F>::call_once
  21: generator::detail::gen::gen_init_impl
  22: generator::detail::asm::gen_init
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
worker thread panicking; aborting process
error: test failed, to rerun pass `--lib`

I haven't looked the entire code, but I suspect there's a data race caused by a concurrent write to an UnsafeCell somewhere.

Noah-Kennedy · 2025-03-07T16:48:44Z

Hmm, it looks like the loom failure log is not showing up on ci? ... I captured this locally anyway:
logs

I haven't looked the entire code, but I suspect there's a data race caused by a concurrent write to an UnsafeCell somewhere.

One of the loom tests (pool_shutdown) is timing out because it goes on too long, probably because somehow the number of branches is exploding.

LloydW93 · 2025-04-29T11:46:32Z

Hmm, it looks like the loom failure log is not showing up on ci? ... I captured this locally anyway:
logs
I haven't looked the entire code, but I suspect there's a data race caused by a concurrent write to an UnsafeCell somewhere.

One of the loom tests (pool_shutdown) is timing out because it goes on too long, probably because somehow the number of branches is exploding.

This part of the problem I think is because the loom-compile CI workflow doesn't build with --release. I also see stack overflows in spawn_blocking_when_paused locally without the release target.

However, test runtime::tests::loom_multi_thread::group_a::only_blocking_with_pending is failing because two concurrent threads - one blocking and one non-blocking, are both running with the same task hooks, as they are inherited by default. As the inner type must be Send this implies that the operation is safe as long as we don't modify the Option itself. I think it's the possibility that we do that which loom is warning about, though we don't actually do this and so the usage is, I think, safe. Here's poll() for quick reference:

    /// Safety: mutual exclusion is required to call this function.
    pub(crate) fn poll(self) {
        #[cfg(tokio_unstable)]
        self.trailer().hooks.with_mut(|ptr| unsafe {
            let _guard = ptr.as_mut().and_then(|x| {
                x.as_mut().map(|x| {
                    let _ = panic::catch_unwind(panic::AssertUnwindSafe(|| {
                        x.before_poll(&mut BeforeTaskPollContext {
                            _phantom: Default::default(),
                        })
                    }));

                    set_task_hooks(NonNull::new(
                        (&mut **x) as *mut (dyn TaskHookHarness + Send + Sync + 'static),
                    ))
                })
            });

            let vtable = self.header().vtable;
            (vtable.poll)(self.ptr);
        });
        #[cfg(not(tokio_unstable))]
        unsafe {
            let vtable = self.header().vtable;
            (vtable.poll)(self.ptr);
        }
    }

As I think we only care about the mutable reference to the Option, that would mean we can take the poll out of the with_mut closure, and that seems to make the loom tests pass.

I've pushed that change, a ci.yml update for release builds in loom tests, and a rebase, in my fork at https://github.com/LloydW93/tokio/tree/lloyd/the-hookening - let me know if you just want to pull the commits into your branch or would rather a new PR.

In both CONTRIBUTING.md, and in loom's own documentation, it is recommended to run loom's tests with the --release profile, as they execute many permutations of the same test and therefore compilation time benefits are worthwhile. In tokio-rs#7197 we see this with some individual tests taking over 60s to complete.

This change overhauls the entire task hooks system so that users can propagate arbitrary information between task hook invocations and pass context data between the hook "harnesses" for parent and child tasks at time of spawn. This is intended to be significantly more extensible and long-term maintainable than the current task hooks system, and should ultimately be much easier to stabilize.

Noah-Kennedy · 2025-05-04T15:53:54Z

Hmm, it looks like the loom failure log is not showing up on ci? ... I captured this locally anyway:
logs
I haven't looked the entire code, but I suspect there's a data race caused by a concurrent write to an UnsafeCell somewhere.

One of the loom tests (pool_shutdown) is timing out because it goes on too long, probably because somehow the number of branches is exploding.

This part of the problem I think is because the loom-compile CI workflow doesn't build with --release. I also see stack overflows in spawn_blocking_when_paused locally without the release target.

However, test runtime::tests::loom_multi_thread::group_a::only_blocking_with_pending is failing because two concurrent threads - one blocking and one non-blocking, are both running with the same task hooks, as they are inherited by default. As the inner type must be Send this implies that the operation is safe as long as we don't modify the Option itself. I think it's the possibility that we do that which loom is warning about, though we don't actually do this and so the usage is, I think, safe. Here's poll() for quick reference:
    /// Safety: mutual exclusion is required to call this function.
    pub(crate) fn poll(self) {
        #[cfg(tokio_unstable)]
        self.trailer().hooks.with_mut(|ptr| unsafe {
            let _guard = ptr.as_mut().and_then(|x| {
                x.as_mut().map(|x| {
                    let _ = panic::catch_unwind(panic::AssertUnwindSafe(|| {
                        x.before_poll(&mut BeforeTaskPollContext {
                            _phantom: Default::default(),
                        })
                    }));

                    set_task_hooks(NonNull::new(
                        (&mut **x) as *mut (dyn TaskHookHarness + Send + Sync + 'static),
                    ))
                })
            });

            let vtable = self.header().vtable;
            (vtable.poll)(self.ptr);
        });
        #[cfg(not(tokio_unstable))]
        unsafe {
            let vtable = self.header().vtable;
            (vtable.poll)(self.ptr);
        }
    }
As I think we only care about the mutable reference to the Option, that would mean we can take the poll out of the with_mut closure, and that seems to make the loom tests pass.

I've pushed that change, a ci.yml update for release builds in loom tests, and a rebase, in my fork at https://github.com/LloydW93/tokio/tree/lloyd/the-hookening - let me know if you just want to pull the commits into your branch or would rather a new PR.

Thanks Lloyd for digging into this!

I've applied your suggestions and that fixed the loom issues.

Darksonn · 2025-05-05T14:10:02Z

Can you make a module for all the hook types and traits so we don't pollute tokio::runtime?

carllerche · 2025-05-05T15:15:52Z

This is a very involved change, including non-trivial changes to runtime internals and complex APIs. Before moving forward with this PR, we need to start with a proposal that explains the motivation, problem being solved, etc. I don't know what problem this is trying to solve.

Noah-Kennedy · 2025-05-05T16:57:02Z

Documented the motivation in #7306

Noah-Kennedy · 2025-05-05T17:04:08Z

Can you make a module for all the hook types and traits so we don't pollute tokio::runtime?

Sure!

jlizen · 2025-05-06T17:31:22Z

Documented the motivation in #7306

Left my comments on the RFC, as they were around goals / high-level API rather than impl.

github-actions bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR R-loom-multi-thread-alt Run loom multi-thread alt tests on this PR labels Mar 6, 2025

Noah-Kennedy commented Mar 6, 2025

View reviewed changes

tokio/src/runtime/task_hooks/mod.rs Outdated Show resolved Hide resolved

Noah-Kennedy requested a review from Darksonn March 6, 2025 21:33

mox692 mentioned this pull request Mar 7, 2025

ci: enable printing in multi thread loom tests #7200

Merged

Noah-Kennedy force-pushed the noah/the-hookening branch 2 times, most recently from 75c21b5 to 8963b5d Compare March 7, 2025 19:40

Noah-Kennedy force-pushed the noah/the-hookening branch from 8963b5d to de227f8 Compare May 2, 2025 20:01

Noah-Kennedy force-pushed the noah/the-hookening branch from de227f8 to 0095c7f Compare May 2, 2025 20:01

Noah-Kennedy added 7 commits May 2, 2025 15:03

fix some loom issues

ef9fcf9

try lloyd's fix

855be67

dont add release

eb54da4

loom release

dd6c1b7

fix docs

2769520

debug assertions properly

a70726c

lol

77eb27a

add actions

200cd93

Noah-Kennedy enabled auto-merge (squash) May 4, 2025 17:23

Noah-Kennedy mentioned this pull request May 5, 2025

RFC: Task Hook Inheritance #7306

Open

Noah-Kennedy disabled auto-merge May 5, 2025 16:57

Noah-Kennedy enabled auto-merge (squash) May 5, 2025 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

rt: overhaul task hooks #7197

rt: overhaul task hooks #7197

Uh oh!

Noah-Kennedy commented Mar 6, 2025 •

edited

Loading

Uh oh!

Noah-Kennedy commented Mar 6, 2025

Uh oh!

Uh oh!

rcoh commented Mar 6, 2025

Uh oh!

Noah-Kennedy commented Mar 6, 2025

Uh oh!

Noah-Kennedy commented Mar 6, 2025

Uh oh!

mox692 commented Mar 7, 2025

Uh oh!

Noah-Kennedy commented Mar 7, 2025

Uh oh!

LloydW93 commented Apr 29, 2025 •

edited

Loading

Uh oh!

Noah-Kennedy commented May 4, 2025

Uh oh!

Darksonn commented May 5, 2025

Uh oh!

carllerche commented May 5, 2025

Uh oh!

Noah-Kennedy commented May 5, 2025

Uh oh!

Noah-Kennedy commented May 5, 2025

Uh oh!

jlizen commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

rt: overhaul task hooks #7197

Are you sure you want to change the base?

rt: overhaul task hooks #7197

Uh oh!

Conversation

Noah-Kennedy commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Noah-Kennedy commented Mar 6, 2025

Uh oh!

Uh oh!

rcoh commented Mar 6, 2025

Uh oh!

Noah-Kennedy commented Mar 6, 2025

Uh oh!

Noah-Kennedy commented Mar 6, 2025

Uh oh!

mox692 commented Mar 7, 2025

Uh oh!

Noah-Kennedy commented Mar 7, 2025

Uh oh!

LloydW93 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Noah-Kennedy commented May 4, 2025

Uh oh!

Darksonn commented May 5, 2025

Uh oh!

carllerche commented May 5, 2025

Uh oh!

Noah-Kennedy commented May 5, 2025

Uh oh!

Noah-Kennedy commented May 5, 2025

Uh oh!

jlizen commented May 6, 2025

Uh oh!

Uh oh!

Noah-Kennedy commented Mar 6, 2025 •

edited

Loading

LloydW93 commented Apr 29, 2025 •

edited

Loading