You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider a series of enqueues to a thread pool with two threads: func1(), stalled_func(), func2(), wait a second, func3(). Execution of func3() will unstall stalled_func().
Since enqueue are round-robin, func1() and func2() are enqueued of Thread 1, stalled_func() and func3() on Thread 2.
After func2(), Thread 1 goes to sleep waiting to be signalled sem.acquire_many(), since _in_flight == 0. After a second, when func3() is enqueued, it is pushed to Thread 2's queue and its semaphore is signalled, but it cannot execute func3() since it is in the middle of executing stalled_func(). Thread 1 will continue to wait without stealing Thread 2's pending work.
This will cause the thread-pool to deadlock.
If this behaviour is not supported:
Execution of func3() will unstall stalled_func().
then, that's fair. It's a decent limitation for a thread pool. This behaviour happens when executing a DAG of dynamically connected tasks (e.g.: reading assets from disk). A task node can continue execution only if all its parent nodes have finished execution.
I'm opening an issue for posterity if/when anyone wants to handle this case too.
The text was updated successfully, but these errors were encountered:
Here's an idea to get started: Tracking idle threads. Before a thread goes to sleep (sem.acquire_many), add it to an idle thread queue. Whenever a task needs to be enqueued, pop a thread from the idle thread queue and push it to that thread.
This works reasonably well for some thread pools I've implemented. A very mature, advanced version of this is what is used by rayon.
Consider a series of enqueues to a thread pool with two threads:
func1()
,stalled_func()
,func2()
, wait a second,func3()
. Execution offunc3()
will unstallstalled_func()
.Since enqueue are round-robin,
func1()
andfunc2()
are enqueued of Thread 1,stalled_func()
andfunc3()
on Thread 2.After
func2()
, Thread 1 goes to sleep waiting to be signalledsem.acquire_many()
, since_in_flight == 0
. After a second, whenfunc3()
is enqueued, it is pushed to Thread 2's queue and its semaphore is signalled, but it cannot executefunc3()
since it is in the middle of executingstalled_func()
. Thread 1 will continue to wait without stealing Thread 2's pending work.This will cause the thread-pool to deadlock.
If this behaviour is not supported:
then, that's fair. It's a decent limitation for a thread pool. This behaviour happens when executing a DAG of dynamically connected tasks (e.g.: reading assets from disk). A task node can continue execution only if all its parent nodes have finished execution.
I'm opening an issue for posterity if/when anyone wants to handle this case too.
The text was updated successfully, but these errors were encountered: