ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config objec #2459

dinchu · 2023-09-21T18:11:56Z

when trying to load quantized models i always get

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config objec

The text was updated successfully, but these errors were encountered:

aliozts · 2023-09-21T21:13:07Z

Hi, may I ask how do you load the model, in my case with single GPU I also had that problem and I had to use disable_exllama=True while loading the model (change the config.json in your model file and add disable_exllama: true to quantization_config there if you're loading it directly from a file). When I worked with 2 GPUs, I did not have this problem. Sorry if it does not answer your question but I hope it helps. I do not know why this happens though sadly.

ilovesouthpark · 2023-10-05T13:05:27Z

It can work, Thanks. @aliozts

tigerinus · 2023-11-07T04:04:55Z

Hi, may I ask how do you load the model, in my case with single GPU I also had that problem and I had to use disable_exllama=True while loading the model (change the config.json in your model file and add disable_exllama: true to quantization_config there if you're loading it directly from a file). When I worked with 2 GPUs, I did not have this problem. Sorry if it does not answer your question but I hope it helps. I do not know why this happens though sadly.

Disabling Exllama makes the entire inferencing much slower.

Check out AutoGPTQ/AutoGPTQ#406 for how to enable Exllama.

Saravan004 · 2023-11-30T11:14:02Z

Why I cant run it in GPU? Although I am having NVIDIA GeForce Mx450.
Could anyone please help?

apoorvpandey0 · 2023-12-24T21:38:54Z

I have NVIDIA GTX 1650 still getting same error

chenyujiang11 · 2024-01-08T12:12:03Z

在config.json的quantization_config下加入"disable_exllama": true，即可解决问题。
这个错误只有单卡的时候才会出现，多卡时未出现过，使用的显卡为Tesla T4。

UmiVilbig · 2024-01-10T02:07:17Z

I was running into a similar problem running GPTQ in a docker container. I was getting disable_exllama error. In short the issue showed up when I ran the container without --gpus all command. Below is my system configs

GPU: 1660Ti
transformers==4.36.2
optimum==1.16.1
auto-gptq==0.6.0+cu118
CUDA=12.3

SOLUTION: for me I fixed the disable_exllama error by running the container with --gpus all

NamburiSrinath · 2024-09-19T21:12:51Z

I am also facing the same issue. Disabling exllama increases the inference speed a lot, so am not sure if that's the ideal way.

Here are more details - #3530

kj-1024 · 2024-10-11T07:51:55Z

我遇到了同样问题，我使用dbgpt项目启动模型Qwen2.5-32B-Instruct-GPTQ-Int4报错如题目所述。定位到config.json文件，将use_exllama该项从true修改为false可解决。希望对此有帮助。

chenyujiang11 mentioned this issue Jan 8, 2024

I have NVIDIA GTX 1650 still getting same error #2899

Open

NamburiSrinath mentioned this issue Sep 19, 2024

Running FastChat on GPTQ (and quantized) models #3530

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config objec #2459

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config objec #2459

dinchu commented Sep 21, 2023

aliozts commented Sep 21, 2023

ilovesouthpark commented Oct 5, 2023

tigerinus commented Nov 7, 2023

Saravan004 commented Nov 30, 2023

apoorvpandey0 commented Dec 24, 2023

chenyujiang11 commented Jan 8, 2024

UmiVilbig commented Jan 10, 2024

NamburiSrinath commented Sep 19, 2024

kj-1024 commented Oct 11, 2024

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config objec #2459

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config objec #2459

Comments

dinchu commented Sep 21, 2023

aliozts commented Sep 21, 2023

ilovesouthpark commented Oct 5, 2023

tigerinus commented Nov 7, 2023

Saravan004 commented Nov 30, 2023

apoorvpandey0 commented Dec 24, 2023

chenyujiang11 commented Jan 8, 2024

UmiVilbig commented Jan 10, 2024

NamburiSrinath commented Sep 19, 2024

kj-1024 commented Oct 11, 2024

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config objec #2459

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config objec #2459