Distributing without CUDA #136
-
Hi, I want to distribute an exllamav2-based app without requiring users to install CUDA. I'm getting "CUDA error: no kernel image is available for execution on the device [...]\exllamav2_ext\cuda\rope.cu 131", and I guess that means that the extension is not built for the right GPU architecture which isn't surprising. Is there a way to build the extension with all the kernels built for all the architectures and include all that with my app? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Well I got it working by adding this code to setup.py:
Overkill, sure, but the resulting binary is only 70 MB, a lot less than e.g. cublas, so I guess it's fine! |
Beta Was this translation helpful? Give feedback.
-
Isn't this more or less what's already in the releases, just with more architectures? And wouldn't it still build for a specific Python and CUDA version? |
Beta Was this translation helpful? Give feedback.
-
I don't know, I didn't see any documentation about how those builds were produced. I wanted to make a build with changes so I couldn't use the releases. Yes, I expect that it is specific to a Python version, but it will run without CUDA installed at all. Maybe it needs to match the CUDA version that torch was compiled with but I don't know. |
Beta Was this translation helpful? Give feedback.
Well I got it working by adding this code to setup.py:
Overkill, sure, but the resulting binary is only 70 MB, a lot less than e.g. cublas, so I guess it's fine!