-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
很慢 GPU 没有被有效利用起来 #38
Comments
可以试试把cuda,torch三件套版本升高一点(torch上限2.2.1或2.2.0),不过或许也提升不了多少,我本机测试,cuda能跑满,但是曲线不平,有很大的波动。我回头试试其他几种方法。 |
torch==2.2.0 |
也是很慢,迭代一步要10多分钟。5秒的音频要一个多小时才跑完。 |
你不是开了lowram模式了,还有你的显存是多大的 |
没有开lowram,显存是12G的。 选用的还是acc版的模型 |
我用4070 12G 跑一段几分钟的事, 我查了下GTX TITAN X 好像不支持混合精度,你修改utils.py的代码22行把 torch.float16改成torch.float32试试 |
谢谢大佬。 请问下,有没有可能是安装的时候出的问题有关。因为在pip install facenet-pytorch 的时候,提示"facenet-pytorch 2.6.0 requires torch<2.3.0,>=2.2.0" uninstalling torch 2.5.0 will break other nodes",我已经装上 torch 2.5.0 不想重装就加了--no-deps,把facenet-pytorch装了。跑是能跑起来的,不知道慢和这个有没有关系? |
总之,非常谢谢大佬。 |
cuda版本按理是向下兼容的,还是半精 全精的问题,facenet-pytorch主要是安装的时候会默认强制装torch 而不是torch-cuda(已有的还给你卸载,如果不使用--no-deps),所以很多comfyUI用户会因为找到不到cuda而进不去。 |
好的,明白,谢谢您的详解。 |
2024-07-31 20:15:45.650643419 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 12 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message. 2024-07-31 20:15:45.652126712 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-07-31 20:15:45.652142112 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
CUDA 11.8
cuDNN 8.9.2
onnxruntime 1.8
The text was updated successfully, but these errors were encountered: