-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v100显卡,加载量化模型Yi-34B-Chat-4bits,推理速度很慢 #484
Comments
@zxdposter 你好请问34B inference需要几张显卡?需要多卡吗? |
same too. 8x 4090 . so slow. |
你好,请问你的gptq版本是多少,官网没看到针对pytorch2.1.2的autogptq版本耶 |
@lyan62 大概需要20-30G显存 |
确实太慢了,有什么好的方法吗 |
可能是你直接pip install -r requirements.txt导致的torch不可用。 |
@ChinesePainting 感谢提供解决方法,后续我尝试一下。 |
Reminder
Environment
Current Behavior
v100显卡,加载量化模型Yi-34B-Chat-4bits,推理速度很慢,要200秒左右,显存占了20G,还有10G空余
请问有办法解决吗?
我查过 issue,有几个人遇到了,但是都没有解决方法。
Expected Behavior
No response
Steps to Reproduce
输出结果:
耗时194.49595594406128 我是零一万物开发的一个智能助手,我叫 Yi,我是由零一万物的研究员们通过大量的文本数据进行训练,学习了语言的各种模式和关联,从而能够生成文本、回答问题、翻译语言的。我可以帮助用户解答问题、提供信息,以及进行各种语言相关的任务。我并不是一个真实的人,而是由代码和算法构成的,但我尽力模仿人类的交流方式,以便更好地与用户互动。如果你有任何问题或需要帮助,请随时告诉我!
Anything Else?
No response
The text was updated successfully, but these errors were encountered: