-
Notifications
You must be signed in to change notification settings - Fork 30
Enable architecture selection for DPCTL_TARGET_CUDA
#2096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
View rendered docs @ https://intelpython.github.io/dpctl/pulls/2096/index.html |
Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_8 ran successfully. |
Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_9 ran successfully. |
Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_10 ran successfully. |
else() | ||
if (DEFINED ENV{DPCTL_TARGET_CUDA}) | ||
set(_dpctl_sycl_targets "nvptx64-nvidia-cuda,spir64-unknown-unknown") | ||
if (NOT "x${DPCTL_TARGET_CUDA}" STREQUAL "x") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fair to validate DPCTL_TARGET_CUDA
only in case when empty DPCTL_SYCL_TARGETS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was how we were doing it before—but it looks like current logical flow will add HIP targets even when DPCTL_SYCL_TARGETS
is not none, but not CUDA
so that should probably be changed, either make DPCTL_SYCL_TARGETS
exclusive from both or check DPCTL_TARGET_CUDA
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, when the user passes DPCTL_SYCL_TARGETS
he is responsible for the correctness of the flags.
The logic of checking if (NOT “x${DPCTL_TARGET_HIP}” STREQUAL “x”)
when DPCTL_SYCL_TARGETS
is not none
was added to pass the correct compile and link options.
if(_dpctl_amd_targets)
list(APPEND _dpctl_sycl_target_compile_options -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=${_dpctl_amd_targets})
list(APPEND _dpctl_sycl_target_link_options -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=${_dpctl_amd_targets})
endif()
I am already working on PR that will refresh the logic for AMD build using aliases to remove if(_dpctl_amd_targets)
branch.
For reference, compute architecture strings like ``sm_80`` are based on | ||
CUDA Compute Capability. A complete mapping between NVIDIA GPU models and their | ||
respective ``sm_XX`` values can be found in the official | ||
`CUDA GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mapping is not clear from the reference doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, seems they aren't necessarily related either (see here and below it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A CUDA developer notes that sm_XX
refers to machine code for a specific GPU hardware architecture. Since each Compute Capability version corresponds to a particular architecture (CC 8.0 -> Ampere A100) it is reasonable to say that sm_80
corresponds to CC 8.0
I changed the text a bit
Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_17 ran successfully. |
This PR proposes to change
DPCTL_TARGET_CUDA
CMake option from a boolean to a string allowing users to specify a CUDA architecture (e.g.sm_80
). If not specified, it defaults tosm_50
.The specified architecture is used to construct a SYCL alias target (e.g.
nvidia_gpu_sm_80
) and passed via-fsycl-targets
option, following OneAPI for NVIDIA GPUsAdditionally removing
DPCTL_TARGET_CUDA
env handling logic