-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vLLM quantization BrokenPipeError #252
Comments
I don't think this is because of the nf4 or quantization in general, vllm multi gpu used to work in optimum-benchmark when it used ray for distribution, but last time I tried it didn't work, so would appreciate a PR if you can get it working. |
Thanks for your answer! I understand about the use of dual GPUs. I think the BrokenPipeError does happen because of the quantization. I can execute the benchmark in the same configuration with the full meta-llama/Meta-Llama-3.1-8B model on one GPU. If I try using the bnb nf4 quantized version hugging-quants/Meta-Llama-3.1-8B-BNB-NF4 it throws the error. |
The broken pipe error is not a problem in itself, it happens when the process you're running the benchmark in exists abruptly. |
Any news ? |
Hello,
I am trying to benchmark a model quantized to nf4 with bnb. How can I run it with the vLLM backend without getting a BrokenPipeError? Also, how can I utilize both GPUs of my machine?
Thank your for your help!
The text was updated successfully, but these errors were encountered: