很慢 GPU 没有被有效利用起来 #38

leegang · 2024-07-31T20:48:47Z

2024-07-31 20:15:45.650643419 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 12 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message. 2024-07-31 20:15:45.652126712 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-07-31 20:15:45.652142112 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

CUDA 11.8
cuDNN 8.9.2
onnxruntime 1.8

smthemex · 2024-08-01T00:36:15Z

可以试试把cuda，torch三件套版本升高一点（torch上限2.2.1或2.2.0），不过或许也提升不了多少，我本机测试，cuda能跑满，但是曲线不平，有很大的波动。我回头试试其他几种方法。

leegang · 2024-08-01T05:44:00Z

可以试试把cuda，torch三件套版本升高一点（torch上限2.2.1或2.2.0），不过或许也提升不了多少，我本机测试，cuda能跑满，但是曲线不平，有很大的波动。我回头试试其他几种方法。

torch==2.2.0

peizhiluo007 · 2024-10-28T12:01:22Z

也是很慢，迭代一步要10多分钟。5秒的音频要一个多小时才跑完。
看GPU的利用率也是100%的。

smthemex · 2024-10-28T12:03:04Z

你不是开了lowram模式了,还有你的显存是多大的

peizhiluo007 · 2024-10-28T12:04:47Z

你不是开了lowram模式了,还有你的显存是多大的

没有开lowram，显存是12G的。
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX TITAN X Off | 00000000:06:00.0 Off | N/A |
| 49% 82C P2 154W / 250W | 9676MiB / 12288MiB | 100% Default |
| | | N/A |

选用的还是acc版的模型

smthemex · 2024-10-28T12:12:40Z

我用4070 12G 跑一段几分钟的事, 我查了下GTX TITAN X 好像不支持混合精度,你修改utils.py的代码22行把 torch.float16改成torch.float32试试

peizhiluo007 · 2024-10-28T12:24:21Z

我用4070 12G 跑一段几分钟的事, 我查了下GTX TITAN X 好像不支持混合精度,你修改utils.py的代码22行把 torch.float16改成torch.float32试试

谢谢大佬。
1、试了直接改torch.float32之后，报显存不够了。
2、又试了下，改torch.float32同时把lowram开启。这种情况下确实比之前快了近一倍，迭代一次差不多3-4分钟。5秒跑完20分钟左右。

请问下，有没有可能是安装的时候出的问题有关。因为在pip install facenet-pytorch 的时候，提示"facenet-pytorch 2.6.0 requires torch<2.3.0,>=2.2.0" uninstalling torch 2.5.0 will break other nodes"，我已经装上 torch 2.5.0 不想重装就加了--no-deps，把facenet-pytorch装了。跑是能跑起来的，不知道慢和这个有没有关系？

peizhiluo007 · 2024-10-28T12:30:57Z

总之，非常谢谢大佬。
改成torch.float32同时把lowram开启之后，确实快很多了。

smthemex · 2024-10-28T12:35:14Z

cuda版本按理是向下兼容的，还是半精全精的问题，facenet-pytorch主要是安装的时候会默认强制装torch 而不是torch-cuda（已有的还给你卸载，如果不使用--no-deps），所以很多comfyUI用户会因为找到不到cuda而进不去。

peizhiluo007 · 2024-10-28T12:37:33Z

cuda版本按理是向下兼容的，还是半精全精的问题，facenet-pytorch主要是安装的时候会默认强制装torch 而不是torch-cuda（已有的还给你卸载，如果不使用--no-deps），所以很多comfyUI用户会因为找到不到cuda而进不去。

好的，明白，谢谢您的详解。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

很慢 GPU 没有被有效利用起来 #38

很慢 GPU 没有被有效利用起来 #38

leegang commented Jul 31, 2024

smthemex commented Aug 1, 2024

leegang commented Aug 1, 2024

peizhiluo007 commented Oct 28, 2024

smthemex commented Oct 28, 2024

peizhiluo007 commented Oct 28, 2024 •

edited

Loading

smthemex commented Oct 28, 2024

peizhiluo007 commented Oct 28, 2024 •

edited

Loading

peizhiluo007 commented Oct 28, 2024

smthemex commented Oct 28, 2024

peizhiluo007 commented Oct 28, 2024

很慢 GPU 没有被有效利用起来 #38

很慢 GPU 没有被有效利用起来 #38

Comments

leegang commented Jul 31, 2024

smthemex commented Aug 1, 2024

leegang commented Aug 1, 2024

peizhiluo007 commented Oct 28, 2024

smthemex commented Oct 28, 2024

peizhiluo007 commented Oct 28, 2024 • edited Loading

smthemex commented Oct 28, 2024

peizhiluo007 commented Oct 28, 2024 • edited Loading

peizhiluo007 commented Oct 28, 2024

smthemex commented Oct 28, 2024

peizhiluo007 commented Oct 28, 2024

peizhiluo007 commented Oct 28, 2024 •

edited

Loading

peizhiluo007 commented Oct 28, 2024 •

edited

Loading