Add timeout parameter to `wait(::Condition)` #56974

kpamnany · 2025-01-06T19:56:38Z

We have a need for this capability. I believe this closes #36217.

The implementation is straightforward and there are a couple of tests.

vtjnash · 2025-01-06T20:24:51Z

base/condition.jl

+            # Confirm that the waiting task is still in the wait queue and remove it. If
+            # the task is not in the wait queue, it must have been notified already so we
+            # don't do anything here.


This appears to introduce a data race though, so we cannot merge this

How's that? We're locking the condition variable here.

This Timer runs concurrently with the return from wait, so by the time this code runs, you might have just corrupted some arbitrary subsequent wait on the same condition or by the time you schedule the TimeoutError, it could blow up some completely unrelated wait

Ah, okay. There's an ABA problem. Let me see if I can find a solution for that.

But the waiting task is only scheduled with a TimeoutError if it was in this condition's wait queue, so I'm not sure I understand your "or" case here -- the only subsequent wait that could get blown up is a wait on the same condition, which is the same ABA problem?

It could been in the waitq, then removed before you got around to scheduling it, or vice versa with some other thread scheduling before it got around to removing it from the queue. Those codes are running on other threads, so it could be concurrent. There is potentially no guarantee that you can safely mutate this data-structure concurrently on two threads (#55542)

Pushed a fix for the ABA problem that relies on happens-before -- if the waiter was scheduled, it sets waiter_left before returning. It can only re-enter the condition's wait queue by another call to wait, for which it must acquire the lock.

We acquire the condition's lock before checking waiter_left and for the task's presence in the wait queue. If the task is present, it can only be because it has not been scheduled, because if it was scheduled, it would have set waiter_left before re-entering the wait queue.

I think the combination of the lock and the atomic assure there is no ABA problem.

It could been in the waitq, then removed before you got around to scheduling it

We acquire the lock, confirm that the waiter did not leave and remove it from the wait queue before scheduling it. If it was not in the wait queue, we do not schedule it and this decision is made while holding the lock.

some other thread scheduling before it got around to removing it from the queue

If the task is scheduled by notify, then it is removed from the condition's wait queue before it is scheduled, which is done while holding the condition's lock. If it is not in the wait queue, then we do not schedule it.

base/condition.jl

test/channels.jl

JamesWrigley · 2025-01-10T09:47:54Z

How difficult would it be to re-use this implementation for waiting on other objects like Event/Channel etc? (not saying it should be part of this PR, just curious)

kpamnany · 2025-01-10T13:09:49Z

How difficult would it be to re-use this implementation for waiting on other objects like Event/Channel etc? (not saying it should be part of this PR, just curious)

Both Events and Channels use Conditions to wait. To implement timeouts for those, we would leverage this capability but the actual implementation for those would be different.

kpamnany · 2025-01-10T18:14:55Z

A timed-out wait will now return :timed_out (like Base.timedwait) instead of throwing a TimeoutError -- the thinking is that we should avoid using exceptions for normal code flow. If you requested a timeout and it occurs, that's normal code flow.

Adjusted the tests as well.

NHDaly · 2025-01-13T00:48:39Z

base/condition.jl

+            end
+            unlock(c.lock)
+            # send the waiting task a timeout
+            dosched && schedule(ct, :timed_out)


What would you think about throwing an instance of a custom struct here instead?

struct TimeOutEvent end

and returning

dosched && schedule(ct, TimeOutEvent())

With the current code, i worry about someone having a typo and missing the timeout for that reason :(

if wait(cond) === :time_out # oops, this will never be reached # handle timeout end

Oh, wow, i just read your commit comment:

Return :timed_out instead of :timeout like Base.timedwait

.... :/ I don't love that decision for base..... But i also don't love the idea of doing something different than timedwait....
I think my preference is still to use a custom struct here, but i feel a bit less confident about it now.

I don't love the special symbol either, but I didn't want to have two different timed waits in Base with different interfaces.

Would be good to get some more reviews on this PR.

NHDaly

This LGTM! Thanks for addressing the test flakiness.
I like the return API better than throwing an exception 👍

LGTM except for my suggestion to change to a type instead of a symbol for the return value

kpamnany · 2025-01-13T16:01:01Z

Would be good to get some additional reviews. Attn: @vchuravy, @JeffBezanson, @gbaraldi, @topolarity.

gbaraldi · 2025-01-13T16:17:44Z

base/condition.jl

+            dosched && schedule(ct, :timed_out)
+        end
+        t.sticky = false
+        Threads._spawn_set_thrpool(t, :interactive)


What happens if we have no threads in the interactive threadpool?

Threads._spawn_set_thrpool will set the threadpool to :default if there are no interactive threads.

gbaraldi · 2025-01-13T16:21:19Z

base/condition.jl

+            # Confirm that the waiting task is still in the wait queue and remove it. If
+            # the task is not in the wait queue, it must have been notified already so we
+            # don't do anything here.
+            if !waiter_left[] && ct.queue == c.waitq


Might be useful to note the orderings here in a little diagram?

Added a comment describing the typical flows and some possible interleavings.

gbaraldi · 2025-01-17T22:54:30Z

LGTM

davidanthoff · 2025-01-18T00:11:17Z

I have to admit, I'm not keen that this kind of API was merged. I would have much preferred to a) see a generic cancellation framework designed that handles timeouts as a special case of cancelling operations, and b) an implementation in a package to test things out until a good design is achieved.

Is the plan now to add the timeout keyword to any function that can block in an async way? And then if down the road someone also wants to cancel operations for other reasons, we add another keyword? To every function?

Would it not be possible to add an implementation for wait(c, token) to https://github.com/davidanthoff/CancellationTokens.jl and see how that kind of design works, before settling on a API pattern here in base?

kpamnany · 2025-01-21T14:11:30Z

I agree that a more general cancellation mechanism would be better. However, there are many significant challenges to implementing such a thing, especially when managing multi-threaded interactions. Given the availability of time and resources in our team, having this is better than having nothing.

We have a need for this capability. I believe this closes JuliaLang#36217. The implementation is straightforward and there are a couple of tests.

vtjnash · 2025-01-21T14:22:43Z

It may be worth knowing that all of our blocking event-based objects (Channels, IO objects, etc.) already support cancellation (that is how this PR is implemented "under the hood"), while there is only sporadic and unreliable support for timeouts

vchuravy · 2025-01-21T14:25:01Z

Should we unwind this for 1.12, and give it some more time to bake?

kpamnany · 2025-01-21T14:35:06Z

It may be worth knowing that all of our blocking event-based objects (Channels, IO objects, etc.) already support cancellation (that is how this PR is implemented "under the hood"), while there is only sporadic and unreliable support for timeouts

I don't see any ability to cancel a blocking put! call on a Channel... here is the block in put_buffered for example. Can I get a pointer to something cancellation related?

kpamnany · 2025-01-21T14:36:47Z

Should we unwind this for 1.12, and give it some more time to bake?

We can run with this patch in our build, so this can be unwound if there's a better solution planned. Is there such a plan however?

vtjnash · 2025-01-21T14:59:46Z

All cancellation is implemented with close. Any temporary cancellation (such as timeouts) causes data races and synchronization issues which can lead to data corruption (e.g. #57011 is an example of the problems caused by using any other implementation for cancellation other than close)

kpamnany · 2025-01-21T15:26:56Z

All cancellation is implemented with close. Any temporary cancellation (such as timeouts) causes data races and synchronization issues which can lead to data corruption (e.g. #57011 is an example of the problems caused by using any other implementation for cancellation other than close)

This approach pushes the complexity of managing synchronization issues to user code. I need a separate task to close the entity (or cancel the operation) -- that needs to synchronize with the task that is blocked/waiting. I'm not convinced that this is the best we can do.

kpamnany · 2025-01-21T16:28:26Z

AFAICT from a quick scan of the literature, cancellation tokens are still being researched. Here is a recent (2022) survey of problems with cancellations and timeouts; it seems clear that there is no ideal solution at this time.

kpamnany · 2025-01-23T23:34:39Z

See #57148.

kpamnany requested review from vchuravy, vtjnash and NHDaly January 6, 2025 19:56

vtjnash reviewed Jan 6, 2025

View reviewed changes

kpamnany force-pushed the kp-timedwait branch from 8429be8 to d989c76 Compare January 6, 2025 21:49

nsajko added the multithreading Base.Threads and related functionality label Jan 7, 2025

NHDaly reviewed Jan 9, 2025

View reviewed changes

base/condition.jl Outdated Show resolved Hide resolved

NHDaly reviewed Jan 9, 2025

View reviewed changes

base/condition.jl Show resolved Hide resolved

NHDaly reviewed Jan 9, 2025

View reviewed changes

test/channels.jl Outdated Show resolved Hide resolved

kpamnany force-pushed the kp-timedwait branch from 53a5d21 to 0d2c642 Compare January 10, 2025 18:10

NHDaly reviewed Jan 13, 2025

View reviewed changes

NHDaly approved these changes Jan 13, 2025

View reviewed changes

gbaraldi reviewed Jan 13, 2025

View reviewed changes

kpamnany added 7 commits January 16, 2025 16:09

Add timeout parameter to wait(::Condition)

a929b53

Prevent ABA problem

efca4fe

Make the timeout task an interactive task

f48de0c

Return :timeout instead of throwing an exception

f160180

Return :timed_out instead of :timeout like Base.timedwait

866a6db

Avoid an allocation in the no-timeout path

2fbcf85

Add comment describing typical flows of wait with timeout

11ef85a

kpamnany force-pushed the kp-timedwait branch from ee1a4cb to 11ef85a Compare January 16, 2025 21:09

NHDaly requested a review from gbaraldi January 16, 2025 21:51

kpamnany merged commit 647b9f4 into master Jan 17, 2025
5 of 7 checks passed

kpamnany deleted the kp-timedwait branch January 17, 2025 23:19

davidanthoff mentioned this pull request Jan 18, 2025

wait() with timeout #36217

Closed

kpamnany mentioned this pull request Jan 21, 2025

Add timeout parameter to wait(::Condition) (#56974) RelationalAI/julia#205

Open

3 tasks

kpamnany mentioned this pull request Jan 23, 2025

Add Experimental.wait_with_timeout #57148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeout parameter to `wait(::Condition)` #56974

Add timeout parameter to `wait(::Condition)` #56974

kpamnany commented Jan 6, 2025

vtjnash Jan 6, 2025

kpamnany Jan 6, 2025

vtjnash Jan 6, 2025 •

edited

Loading

kpamnany Jan 6, 2025

vtjnash Jan 6, 2025

kpamnany Jan 6, 2025

kpamnany Jan 6, 2025

JamesWrigley commented Jan 10, 2025

kpamnany commented Jan 10, 2025

kpamnany commented Jan 10, 2025 •

edited

Loading

NHDaly Jan 13, 2025

NHDaly Jan 13, 2025

kpamnany Jan 13, 2025

NHDaly left a comment

kpamnany commented Jan 13, 2025

gbaraldi Jan 13, 2025

kpamnany Jan 13, 2025

gbaraldi Jan 13, 2025

kpamnany Jan 16, 2025

gbaraldi commented Jan 17, 2025

davidanthoff commented Jan 18, 2025

kpamnany commented Jan 21, 2025

vtjnash commented Jan 21, 2025

vchuravy commented Jan 21, 2025

kpamnany commented Jan 21, 2025

kpamnany commented Jan 21, 2025

vtjnash commented Jan 21, 2025

kpamnany commented Jan 21, 2025

kpamnany commented Jan 21, 2025

kpamnany commented Jan 23, 2025

Add timeout parameter to wait(::Condition) #56974

Add timeout parameter to wait(::Condition) #56974

Conversation

kpamnany commented Jan 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamesWrigley commented Jan 10, 2025

kpamnany commented Jan 10, 2025

kpamnany commented Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NHDaly left a comment

Choose a reason for hiding this comment

kpamnany commented Jan 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbaraldi commented Jan 17, 2025

davidanthoff commented Jan 18, 2025

kpamnany commented Jan 21, 2025

vtjnash commented Jan 21, 2025

vchuravy commented Jan 21, 2025

kpamnany commented Jan 21, 2025

kpamnany commented Jan 21, 2025

vtjnash commented Jan 21, 2025

kpamnany commented Jan 21, 2025

kpamnany commented Jan 21, 2025

kpamnany commented Jan 23, 2025

Add timeout parameter to `wait(::Condition)` #56974

Add timeout parameter to `wait(::Condition)` #56974

vtjnash Jan 6, 2025 •

edited

Loading

kpamnany commented Jan 10, 2025 •

edited

Loading