Low GPU utilization on 1080ti, 2080ti and TitanX #80

DiamonDinoia · 2020-06-13T23:47:35Z

Hello,
I have been using the library for one of my reasearh projects. I noticed that the GPU is not fully utilized. By reading the code I noticed that there are some hardcoded values. For example:

gpuNUFFT/CUDA/src/gpu/std_gpuNUFFT_kernels.cu

Line 134 in 7d5fc93

dim3 block_dim(64, 1, 8);

gpuNUFFT/CUDA/inc/cuda_utils.hpp

Line 155 in 4508792

inline dim3 getOptimalGridDim(long N, long thread_count)

gpuNUFFT/CUDA/inc/cuda_utils.cuh

Line 75 in 4508792

#define THREAD_BLOCK_SIZE 256

Do you know if it is possible to change these values to increase the parallelism?
Or is there another way to do so? I'm happy to splend some time making these values parametric based on the architecture.

Another possible strategy would be the "CUDA Dynamic Parallelism" if these values cannot be changed (https://devblogs.nvidia.com/cuda-dynamic-parallelism-api-principles/).

Thanks.
Marco

andyschwarzl · 2020-06-17T19:27:51Z

Hi,

thanks for pointing that out. The code has been written to support most of nowadays "old" GPUs starting even with the support of compute capabilities 1.3 onwards :P

Feel free to modify the code and I appreciate any pull request that make GPU utilization more dynamically.

Thanks!

Best regards,

Andreas

DiamonDinoia · 2020-06-17T19:51:30Z

Do you know about any problems with that ans how did you derive that numbers?
It will be a good starting point because if they depend on the input it might be possible to use dynamic parallelism, if they depend on the hardware I can use the runtime to determine them.

andyschwarzl · 2020-06-17T20:19:02Z

The parameters above are basically hardware-dependent and I used the occupancy tool the derive these values. Another parameter is the sectorWidth which basically defines the amount of shared memory used per thread block. So I guess a good starting point would be to increase the thread-count and sectorWidth to see if the performance/utilization increase.

chaithyagr · 2020-07-02T09:27:09Z

Perhaps we could add code based on CUDA capability protected with #ifdef
Further, I noticed that n_coils_cc is always 1 for 3D, maybe we could use the same function we use for GPU memory estimation for 2D to better obtain a good n_coils_cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low GPU utilization on 1080ti, 2080ti and TitanX #80

Low GPU utilization on 1080ti, 2080ti and TitanX #80

DiamonDinoia commented Jun 13, 2020 •

edited

Loading

andyschwarzl commented Jun 17, 2020

DiamonDinoia commented Jun 17, 2020

andyschwarzl commented Jun 17, 2020

chaithyagr commented Jul 2, 2020

Low GPU utilization on 1080ti, 2080ti and TitanX #80

Low GPU utilization on 1080ti, 2080ti and TitanX #80

Comments

DiamonDinoia commented Jun 13, 2020 • edited Loading

andyschwarzl commented Jun 17, 2020

DiamonDinoia commented Jun 17, 2020

andyschwarzl commented Jun 17, 2020

chaithyagr commented Jul 2, 2020

DiamonDinoia commented Jun 13, 2020 •

edited

Loading