Inference bottlenecked by _local_scalar_dense running on CPU #1050

JasonWongFGF · 2024-05-22T16:16:56Z

Before Asking

I have read the README carefully. 我已经仔细阅读了README上的操作指引。
I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集，我已经仔细阅读了训练自定义数据的教程，以及按照正确的目录结构存放数据集。（FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。）
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码，重新运行之后，问题仍不能解决。

Search before asking

I have searched the YOLOv6 issues and found no similar questions.

Question

Using torch.autograd.profiler during inference, these are my results

Notice how _local_scalar_dense is executed using the CPU rather than CUDA unlike the rest of the function calls. As a result, my inference speed is heavily bottlenecked by those 3 _local_scalar_dense calls.

My question is . . .

Where _local_scalar_dense is called within the YOLOv6 inference pipeline
Can I forcefully ensure _local_scalar_dense is executed using CUDA

I should also mention this issue is semi-reproducible as I've tried running the same code on multiple systems using the same environment. I'd observe this issue on some but not all.

Additional

No response

JasonWongFGF added the question Further information is requested label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference bottlenecked by _local_scalar_dense running on CPU #1050

Inference bottlenecked by _local_scalar_dense running on CPU #1050

JasonWongFGF commented May 22, 2024

Inference bottlenecked by _local_scalar_dense running on CPU #1050

Inference bottlenecked by _local_scalar_dense running on CPU #1050

Comments

JasonWongFGF commented May 22, 2024

Before Asking

Search before asking

Question

Additional