CUDA out of memory while Deploy #520

anisingh1 · 2023-10-05T17:20:39Z

anisingh1
Oct 5, 2023

Hi, I am getting following error:
torch.cuda.OutOfMemoryError: CUDA out of memory

while deploying 4 bit quantized llama2 70b model using following command:
python3 -m lmdeploy.serve.turbomind.deploy llama2 llama2-chat-70b-w4 --model-format awq --group-size 128 --tp 4

I am using 4 x NVIDIA A10G (24GB VRAM each) configuration and this deployment command is using only one GPU out of 4 and getting out of memory. Is there a way to use all 4 GPUs for deploy command?

lvhan028 · 2023-10-09T03:15:16Z

lvhan028
Oct 9, 2023
Maintainer

@irexyc please do the favor

0 replies

irexyc · 2023-10-09T03:19:53Z

irexyc
Oct 9, 2023
Collaborator

You could do this way:

find deploy.py in python package installation path
edit funciton deploy_awq. change model_params[xxx] = y to model_params[xxx] = y.cpu()
for example, like this line https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/serve/turbomind/deploy.py#L717C9-L717C69
change it to model_params[f'layers.{i}.attention.w_qkv.qweight'] = qkv_qw.cpu()

3 replies

lvhan028 Oct 9, 2023
Maintainer

I think we'd better update deploy.py to resolve this issue

WarrenZhao Nov 3, 2023

Hi, I have an another error:

change model_params[xxx] = y to model_params[xxx] = y.cpu();
success in the step of convert llama2-70b-4it in turbomind;
success in python3 -m lmdeploy.turbomind.chat ./workspace;
after the app is done, when I try to type something, I get srun: error: SH-IDC1-10-142-4-22: task 0: Aborted (core dumped)
I am using v100 of 32G, lmdeploy's version is 0.0.12, cuold you please help to fix it?

the output is below:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
session 1


double enter to end input >>> <BOS>[INST] <<SYS>>
 You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, da
ngerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share fals
e information. 
<</SYS>>

 [/INST]  srun: error: SH-IDC1-10-142-4-22: task 0: Aborted (core dumped)

lvhan028 Nov 3, 2023
Maintainer

At the beginning of the user guide https://github.com/InternLM/lmdeploy/blob/main/docs/en/w4a16.md, it shows:

LMDeploy supports LLM model inference of 4-bit weight, with the minimum requirement for NVIDIA graphics cards being sm80, such as A10, A100, Geforce 30/40 series.

But V100 is sm70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory while Deploy #520

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

CUDA out of memory while Deploy #520

anisingh1 Oct 5, 2023

Replies: 2 comments · 3 replies

lvhan028 Oct 9, 2023 Maintainer

irexyc Oct 9, 2023 Collaborator

lvhan028 Oct 9, 2023 Maintainer

WarrenZhao Nov 3, 2023

lvhan028 Nov 3, 2023 Maintainer

anisingh1
Oct 5, 2023

Replies: 2 comments 3 replies

lvhan028
Oct 9, 2023
Maintainer

irexyc
Oct 9, 2023
Collaborator

lvhan028 Oct 9, 2023
Maintainer

lvhan028 Nov 3, 2023
Maintainer