Skip to content

Commit a9f5dc8

Browse files
committed
fixup! Add support for worker state callbacks
1 parent 8ebd9bf commit a9f5dc8

File tree

1 file changed

+12
-10
lines changed

1 file changed

+12
-10
lines changed

src/cluster.jl

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -903,7 +903,7 @@ function _run_callbacks_concurrently(callbacks_name, callbacks_dict, warning_int
903903
end
904904

905905
# Wait on the tasks so that exceptions bubble up
906-
wait.(values(callback_tasks))
906+
foreach(wait, values(callback_tasks))
907907
end
908908

909909
function _add_callback(f, key, dict; arg_types=Tuple{Int})
@@ -928,7 +928,7 @@ _remove_callback(key, dict) = delete!(dict, key)
928928
"""
929929
add_worker_starting_callback(f::Base.Callable; key=nothing)
930930
931-
Register a callback to be called on the master process immediately before new
931+
Register a callback to be called on the master worker immediately before new
932932
workers are started. The callback `f` will be called with the `ClusterManager`
933933
instance that is being used and a dictionary of parameters related to adding
934934
workers, i.e. `f(manager, params)`. The `params` dictionary is specific to the
@@ -939,10 +939,12 @@ file for their definitions.
939939
940940
!!! warning
941941
Adding workers can fail so it is not guaranteed that the workers requested
942-
will exist.
942+
in `manager` will exist in the future. e.g. if a worker is requested on a
943+
node that is unreachable then the worker-starting callbacks will be called
944+
but the worker will never be added.
943945
944946
The worker-starting callbacks will be executed concurrently. If one throws an
945-
exception it will not be caught and will bubble up through [`addprocs`](@ref).
947+
exception it will not be caught and will be rethrown by [`addprocs`](@ref).
946948
947949
Keep in mind that the callbacks will add to the time taken to launch workers; so
948950
try to either keep the callbacks fast to execute, or do the actual work
@@ -961,13 +963,13 @@ remove_worker_starting_callback(key) = _remove_callback(key, worker_starting_cal
961963
"""
962964
add_worker_started_callback(f::Base.Callable; key=nothing)
963965
964-
Register a callback to be called on the master process whenever a worker is
965-
added. The callback will be called with the added worker ID,
966+
Register a callback to be called on the master worker whenever a worker has
967+
been added. The callback will be called with the added worker ID,
966968
e.g. `f(w::Int)`. Chooses and returns a unique key for the callback if `key` is
967969
not specified.
968970
969971
The worker-started callbacks will be executed concurrently. If one throws an
970-
exception it will not be caught and will bubble up through [`addprocs()`](@ref).
972+
exception it will not be caught and will be rethrown by [`addprocs()`](@ref).
971973
972974
Keep in mind that the callbacks will add to the time taken to launch workers; so
973975
try to either keep the callbacks fast to execute, or do the actual
@@ -986,13 +988,13 @@ remove_worker_started_callback(key) = _remove_callback(key, worker_started_callb
986988
"""
987989
add_worker_exiting_callback(f::Base.Callable; key=nothing)
988990
989-
Register a callback to be called on the master process immediately before a
991+
Register a callback to be called on the master worker immediately before a
990992
worker is removed with [`rmprocs()`](@ref). The callback will be called with the
991993
worker ID, e.g. `f(w::Int)`. Chooses and returns a unique key for the callback
992994
if `key` is not specified.
993995
994996
All worker-exiting callbacks will be executed concurrently and if they don't
995-
all finish before the `callback_timeout` passed to `rmprocs()` then the process
997+
all finish before the `callback_timeout` passed to `rmprocs()` then the worker
996998
will be removed anyway.
997999
"""
9981000
add_worker_exiting_callback(f::Base.Callable; key=nothing) = _add_callback(f, key, worker_exiting_callbacks)
@@ -1007,7 +1009,7 @@ remove_worker_exiting_callback(key) = _remove_callback(key, worker_exiting_callb
10071009
"""
10081010
add_worker_exited_callback(f::Base.Callable; key=nothing)
10091011
1010-
Register a callback to be called on the master process when a worker has exited
1012+
Register a callback to be called on the master worker when a worker has exited
10111013
for any reason (i.e. not only because of [`rmprocs()`](@ref) but also the worker
10121014
segfaulting etc). Chooses and returns a unique key for the callback if `key` is
10131015
not specified.

0 commit comments

Comments
 (0)