You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUDA Extended Lambdas (lambdas with __device__ or __host__ __device__) annotations are a useful convenience, but they come with several restrictions that can cause silent or confusing failures in user's code.
For many of these restrictions, libcu++ is powerless to do anything to help users, but where possible, libcu++ should make an effort to detect invalid uses of extended lambdas and emit a more helpful diagnostic.
Some of the most pertinent restrictions stem from the fact that nvcc replaces extended lambdas with a placeholder type whose operator() is not equivalent to the lambda definition
As described above, the CUDA compiler replaces a device extended lambda defined in a host function with a placeholder type defined in namespace scope. This placeholder type does not define a operator() function equivalent to the original lambda declaration. An attempt to determine the return type or parameter types of the operator() function may therefore work incorrectly in host code, as the code processed by the host compiler will be semantically different than the input code processed by the CUDA compiler. However, it is OK to introspect the return type or parameter types of the operator() function within device code. Note that this restriction does not apply to hostdevice extended lambdas.
As described previously, the CUDA compiler replaces an extended device or hostdevice lambda expression with an instance of a placeholder type in the code sent to the host compiler. This placeholder type may define C++ special member functions (e.g. constructor, destructor). As a result, some standard C++ type traits may return different results for the closure type of the extended lambda, in the CUDA frontend compiler versus the host compiler.
nvcc provides the __nv_is_extended_device_lambda_closure_type(X) and __nv_is_extended_host_device_lambda_closure_type(X) built-in traits to detect a __device__ or __host__ __device__ lambda at compile time. This enables us to detect and emit compile time diagnostics for invalid uses with libcu++ constructs.
For example, one of the restrictions on extended lambdas is that you cannot query their return type in host code, so in NVIDIA/libcudacxx#284 we updated cuda::std::invoke_result_t to emit a compile time error when used in host code.
As mentioned in restriction 17, there are several other type traits where we should do similar changes as was done for cuda::std::invoke_result (note that unlike with invoke_result, the following traits should be guarded for both__device__ and __host__ __device__ lambdas).
Hello everyone, I’m still a bit new to open source contributions, and out of the four unassigned "good first issues," this one seems manageable (I hope). I’ve successfully built the dev container and am ready to get started on this issue.
I noticed someone mentioned the dev Discord server in another issue. I’m wondering if the server is still active and if contributors are allowed to join?
CUDA Extended Lambdas (lambdas with
__device__
or__host__ __device__
) annotations are a useful convenience, but they come with several restrictions that can cause silent or confusing failures in user's code.For many of these restrictions, libcu++ is powerless to do anything to help users, but where possible, libcu++ should make an effort to detect invalid uses of extended lambdas and emit a more helpful diagnostic.
Some of the most pertinent restrictions stem from the fact that nvcc replaces extended lambdas with a placeholder type whose
operator()
is not equivalent to the lambda definitionnvcc provides the
__nv_is_extended_device_lambda_closure_type(X)
and__nv_is_extended_host_device_lambda_closure_type(X)
built-in traits to detect a__device__
or__host__ __device__
lambda at compile time. This enables us to detect and emit compile time diagnostics for invalid uses with libcu++ constructs.For example, one of the restrictions on extended lambdas is that you cannot query their return type in host code, so in NVIDIA/libcudacxx#284 we updated
cuda::std::invoke_result_t
to emit a compile time error when used in host code.As mentioned in restriction 17, there are several other type traits where we should do similar changes as was done for
cuda::std::invoke_result
(note that unlike withinvoke_result
, the following traits should be guarded for both__device__
and__host__ __device__
lambdas).Tasks
I also suspect there are changes we can/should make to things in
<functional>
likecuda::std::invoke
, but that will require additional investigation.The text was updated successfully, but these errors were encountered: