-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True
in the quantization config objec
#2459
Comments
Hi, may I ask how do you load the model, in my case with single GPU I also had that problem and I had to use |
It can work, Thanks. @aliozts |
Disabling Exllama makes the entire inferencing much slower. Check out AutoGPTQ/AutoGPTQ#406 for how to enable Exllama. |
Why I cant run it in GPU? Although I am having NVIDIA GeForce Mx450. |
I have NVIDIA GTX 1650 still getting same error |
在config.json的quantization_config下加入"disable_exllama": true,即可解决问题。 |
I was running into a similar problem running GPTQ in a docker container. I was getting GPU: 1660Ti SOLUTION: for me I fixed the |
I am also facing the same issue. Disabling exllama increases the inference speed a lot, so am not sure if that's the ideal way. Here are more details - #3530 |
我遇到了同样问题,我使用dbgpt项目启动模型Qwen2.5-32B-Instruct-GPTQ-Int4报错如题目所述。定位到config.json文件,将use_exllama该项从true修改为false可解决。 希望对此有帮助。 |
when trying to load quantized models i always get
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting
disable_exllama=True
in the quantization config objecThe text was updated successfully, but these errors were encountered: