Skip to content

Commit

Permalink
Note that parallelism with STARPU_NWORKER_PER_CUDA needs asynchronism…
Browse files Browse the repository at this point in the history
… or threads
  • Loading branch information
sthibaul committed Jul 18, 2024
1 parent c5330b8 commit 5cbea78
Showing 1 changed file with 5 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,11 @@ create as many CUDA workers as there are GPU devices.
Specify the number of workers per CUDA device, and thus the number of kernels
which will be concurrently running on the devices, i.e. the number of CUDA
streams. Default value is 1.

For parallelism to be really achieved, one also needs to make CUDA codelets
asynchronous (it is recommended for single-worker performance too anyway,
see ::STARPU_CUDA_ASYNC in \ref CUDA-specificOptimizations), or to set \ref
STARPU_CUDA_THREAD_PER_WORKER to 1.
</dd>

<dt>STARPU_CUDA_THREAD_PER_WORKER</dt>
Expand Down

0 comments on commit 5cbea78

Please sign in to comment.