Replies: 2 comments 3 replies
-
@irexyc please do the favor |
Beta Was this translation helpful? Give feedback.
0 replies
-
You could do this way:
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I am getting following error:
torch.cuda.OutOfMemoryError: CUDA out of memory
while deploying 4 bit quantized llama2 70b model using following command:
python3 -m lmdeploy.serve.turbomind.deploy llama2 llama2-chat-70b-w4 --model-format awq --group-size 128 --tp 4
I am using 4 x NVIDIA A10G (24GB VRAM each) configuration and this deployment command is using only one GPU out of 4 and getting out of memory. Is there a way to use all 4 GPUs for deploy command?
Beta Was this translation helpful? Give feedback.
All reactions