From 5cbea788dd2140f22128f73825333e191495a457 Mon Sep 17 00:00:00 2001 From: Samuel Thibault Date: Thu, 18 Jul 2024 16:32:46 +0200 Subject: [PATCH] Note that parallelism with STARPU_NWORKER_PER_CUDA needs asynchronism or threads --- .../chapters/starpu_installation/environment_variables.doxy | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/doxygen/chapters/starpu_installation/environment_variables.doxy b/doc/doxygen/chapters/starpu_installation/environment_variables.doxy index 3492ceb87b..9322c4b2ea 100644 --- a/doc/doxygen/chapters/starpu_installation/environment_variables.doxy +++ b/doc/doxygen/chapters/starpu_installation/environment_variables.doxy @@ -316,6 +316,11 @@ create as many CUDA workers as there are GPU devices. Specify the number of workers per CUDA device, and thus the number of kernels which will be concurrently running on the devices, i.e. the number of CUDA streams. Default value is 1. + +For parallelism to be really achieved, one also needs to make CUDA codelets +asynchronous (it is recommended for single-worker performance too anyway, +see ::STARPU_CUDA_ASYNC in \ref CUDA-specificOptimizations), or to set \ref +STARPU_CUDA_THREAD_PER_WORKER to 1.
STARPU_CUDA_THREAD_PER_WORKER