Skip to content

Commit

Permalink
Remove eager synchronization with HtoD copies. (#2625)
Browse files Browse the repository at this point in the history
We assumed unpinned memory would always synchronize, but that does
not seem to be the case. For some copy sizes (and potentially on
some, e.g. coherent, memory architectures) the copy is fully
asynchronous.

This optimization was made to make `CuRef` of a scalar fully async.
I considered making the `CuRef` ctor call `memset` instead, which
is always asynchronous by virtue of passing the memory by value,
however that does not support 64-bits floats while `memcpy` of
64 bits is still executed fully asynchronously.
  • Loading branch information
maleadt authored Jan 17, 2025
1 parent d07a245 commit 3d45d85
Showing 1 changed file with 4 additions and 12 deletions.
16 changes: 4 additions & 12 deletions src/array.jl
Original file line number Diff line number Diff line change
Expand Up @@ -527,12 +527,9 @@ Base.copyto!(dest::DenseCuArray{T}, src::DenseCuArray{T}) where {T} =
function Base.unsafe_copyto!(dest::DenseCuArray{T}, doffs,
src::Array{T}, soffs, n) where T
context!(context(dest)) do
# operations on unpinned memory cannot be executed asynchronously, and synchronize
# without yielding back to the Julia scheduler. prevent that by eagerly synchronizing.
if use_nonblocking_synchronization
is_pinned(pointer(src)) || synchronize()
end

# the copy below may block in `libcuda`, so it'd be good to perform a nonblocking
# synchronization here, but the exact cases are hard to know and detect (e.g., unpinned
# memory normally blocks, but not for all sizes, and not on all memory architectures).
GC.@preserve src dest begin
unsafe_copyto!(pointer(dest, doffs), pointer(src, soffs), n; async=true)
if Base.isbitsunion(T)
Expand All @@ -546,12 +543,7 @@ end
function Base.unsafe_copyto!(dest::Array{T}, doffs,
src::DenseCuArray{T}, soffs, n) where T
context!(context(src)) do
# operations on unpinned memory cannot be executed asynchronously, and synchronize
# without yielding back to the Julia scheduler. prevent that by eagerly synchronizing.
if use_nonblocking_synchronization
is_pinned(pointer(dest)) || synchronize()
end

# the copy below may block in `libcuda`; see the note above.
GC.@preserve src dest begin
# semantically, it is not safe for this operation to execute asynchronously, because
# the Array may be collected before the copy starts executing. However, when using
Expand Down

0 comments on commit 3d45d85

Please sign in to comment.