-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC]: Make CUB device-side algorithms work with NVRTC/Jitify #403
Comments
maybe we can kill two birds here and tackle #318 as part of this as well. |
I have a few thoughts after working intensively on integrating CCCL + Jiitfy in CuPy. (Jitify 1, to be precise.) xref: cupy/cupy#7851, cupy/cupy#7869 This is the anatomy of Jitify from my perspective. For any user-provided CUDA C++ kernel string, it
Item 1 is essential to compile any C++ header with NVRTC, because unlike Item 3 is a nice-to-have feature that is probably not needed by libraries like libcudacxx (certainly true for CuPy) that have their own infra (at least for testing purpose). Item 2 is the problematic one (especially after NVIDIA/jitify#118 was merged). One way or another custom std patches from Jitify would conflict with libcudacxx (if available and included). Item 2 existed for historical reasons (NVRTC offered no builtin std functionalities & libcudacxx was yet not a thing), but now that libcudacxx is getting mature (especially with the full-fledged Ultimately, my wishlist is we hard-wire libcudacxx in NVRTC so that we can also completely eliminate Item 1 (and arguably Jitify too 😅), just like cc: @maddyscientist @benbarsdell for vis |
Hi, pretty awesome to have more of these thing runnig with nvrtc! Should this also be working with device-wide sorts or is this not supported, yet? I've been loosely following this example to make it work and I was able to compile with
Are the device-wide sort algorithms (callable from within a kernel) not ready for nvrtc yet, or am I doing something wrong?
I was wondering if eventually I could use device-wide sort via something like
Thanks! |
Hey @m-schuetz, you reminded me I never responded to the discussion you'd opened. I just responded there :) |
Is this a duplicate?
Area
CUB
Is your feature request related to a problem? Please describe.
As a user of CUB, I would like to be able to use device-side algorithms like
cub::BlockReduce
in kernels that are compiled at runtime with NVRTC/Jitify.However, this is not an explicitly supported use case nor does CUB have any testing that verifies this works.
Describe the solution you'd like
All CUB warp/block headers should support runtime compilation with NVRTC and/or Jitify.
Furthermore, CUB should expand its testing infrastructure to enable testing device-side algorithm headers.
Tasks
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: