You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When specifying multiple CUDA compute capabilities (e.g. -fsycl-targets=nvidia_gpu_sm_62,nvidia_gpu_sm_61), the runtime will only ever choose the first one and ignore others.
I believe the relevant code is here, always picking the first llvm_nvptx64 it finds.
The correct behavior would be to choose the latest compute capability included not greater than the one of the device.
So if I compile for sm_60, sm_61, sm_62 in any order and I have a Pascal (sm_61) GPU; it should only try the sm_61 program.
To reproduce
#include<sycl/sycl.hpp>intmain() {
sycl::queue queue{sycl::default_selector_v};
queue.submit([&] (sycl::handler& cgh) {
auto os = sycl::stream{128, 128, cgh};
cgh.single_task<structhello_world>([=]() {
#ifdef __SYCL_CUDA_ARCH__
os << "CUDA_ARCH " << __SYCL_CUDA_ARCH__ << "\n";
#else
os << "Not CUDA\n";
#endif
});
});
return0;
}
With a Pascal (sm_61) GPU:
icpx -fsycl -fsycl-targets=nvidia_gpu_sm_60,nvidia_gpu_sm_61,nvidia_gpu_sm_62 main.cpp prints CUDA_ARCH 600 instead of the correct CUDA_ARCH 610.
icpx -fsycl -fsycl-targets=nvidia_gpu_sm_62,nvidia_gpu_sm_61,nvidia_gpu_sm_60 main.cpp errors out instead of printing CUDA_ARCH 610.
Describe the bug
When specifying multiple CUDA compute capabilities (e.g.
-fsycl-targets=nvidia_gpu_sm_62,nvidia_gpu_sm_61
), the runtime will only ever choose the first one and ignore others.I believe the relevant code is here, always picking the first
llvm_nvptx64
it finds.The correct behavior would be to choose the latest compute capability included not greater than the one of the device.
So if I compile for sm_60, sm_61, sm_62 in any order and I have a Pascal (sm_61) GPU; it should only try the sm_61 program.
To reproduce
With a Pascal (sm_61) GPU:
icpx -fsycl -fsycl-targets=nvidia_gpu_sm_60,nvidia_gpu_sm_61,nvidia_gpu_sm_62 main.cpp
printsCUDA_ARCH 600
instead of the correctCUDA_ARCH 610
.icpx -fsycl -fsycl-targets=nvidia_gpu_sm_62,nvidia_gpu_sm_61,nvidia_gpu_sm_60 main.cpp
errors out instead of printingCUDA_ARCH 610
.Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: