use for yolo #23

liu15509348793 · 2024-10-08T02:20:35Z

Hello, thank you for your work, can ARC replace 11 convolution? It seems that only 33 convolutions can be replaced, and in addition, it seems to be particularly slow, is this normal? This is the issue I had when I tried to replace normal convolutions with ARC in yolo

yifanpu001 · 2024-10-08T03:34:45Z

As mentioned in Sec. 3.4 of the paper, after rotating 1 x 1 convolution is the same as before. So we do not rotate 1 x 1 conv.

liu15509348793 · 2024-10-09T07:26:19Z

As mentioned in Sec. 3.4 of the paper, after rotating 1 x 1 convolution is the same as before. So we do not rotate 1 x 1 conv.

Okay, thanks, is it normal for it to run very slowly? It feels several times slower than normal convolution

yifanpu001 · 2024-10-09T13:16:32Z

I think it depends on your hardware device and software version. We have test our model on RTX 3090 with torch.compile, ARC-ResNet50 have almost the same inference speed as ResNet50. I guess the inference speed would be slower on memory-bounded devices because ARC module will have more memory access compared with standard conv.

liu15509348793 · 2024-10-09T14:00:49Z

我认为这取决于您的硬件设备和软件版本。我们在 RTX 3090 上使用 torch.compile 测试了我们的模型，ARC-ResNet50 的推理速度与 ResNet50 几乎相同。我猜在内存受限的设备上推理速度会更慢，因为与标准 conv 相比，ARC 模块将具有更多的内存访问。

I'm using A100, which should be enough memory, but when I replace the convolution in yolo with ARC, it's three times slower, for some reason

yifanpu001 · 2024-10-09T14:31:24Z

which yolo model do you use? how do you apply arc module in your model? do you use accelerator like triton or tensorrt?

liu15509348793 · 2024-10-10T01:35:48Z

which yolo model do you use? how do you apply arc module in your model? do you use accelerator like triton or tensorrt?

I'm using yolov8 and I'm simply replacing the 3*3 convolution with ARC like this:

YOLOv8.0n backbone

backbone:

[from, repeats, module, args]

[-1, 1, AdaptiveRotatedConv2d, [64, 3, 2]] # 0-P1/2
[-1, 1, AdaptiveRotatedConv2d, [128, 3, 2]] # 1-P2/4
[-1, 3, C2f, [128, True]]
[-1, 1, AdaptiveRotatedConv2d, [256, 3, 2]] # 3-P3/8
[-1, 6, C2f, [256, True]]
[-1, 1, AdaptiveRotatedConv2d, [512, 3, 2]] # 5-P4/16
[-1, 6, C2f, [512, True]]
[-1, 1, AdaptiveRotatedConv2d, [1024, 3, 2]] # 7-P5/32
[-1, 3, C2f, [1024, True]]
[-1, 1, SPPF, [1024, 5]] # 9

liu15509348793 · 2024-10-13T03:20:17Z

Module: ultralytics.nn.modules.conv.Conv, Time: 0.009410 seconds
Module: ultralytics.nn.modules.conv.Conv, Time: 0.005618 seconds
Module: ultralytics.nn.modules.block.C3k2, Time: 0.021514 seconds
Module: ultralytics.nn.modules.conv.Conv, Time: 0.005827 seconds
Module: ultralytics.nn.modules.block.C3k2, Time: 0.020888 seconds
Module: ultralytics.nn.modules.conv.Conv, Time: 0.006252 seconds
Module: ultralytics.nn.modules.block.C3k2, Time: 0.005459 seconds
Module: ultralytics.nn.modules.conv.Conv, Time: 0.005606 seconds
Module: ultralytics.nn.modules.block.C3k2, Time: 0.005005 seconds

Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.062141 seconds
Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.015157 seconds
Module: ultralytics.nn.modules.block.C3k2_ARC, Time: 0.020658 seconds
Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.014282 seconds
Module: ultralytics.nn.modules.block.C3k2_ARC, Time: 0.017815 seconds
Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.015889 seconds
Module: ultralytics.nn.modules.block.C3k2_ARC, Time: 0.010974 seconds
Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.018189 seconds
Module: ultralytics.nn.modules.block.C3k2_ARC, Time: 0.014755 seconds

Here are my latest test results on yolo11, I output the running time of each module, the top is the original module, the bottom is the module that replaced ARC, my experiment was done on the A100, but its speed is several times different, what is the reason for this?

liu15509348793 · 2024-10-13T05:59:53Z

    # get alphas, angles
    # # [bs, Cin, h, w] --> [bs, n_theta], [bs, n_theta]
    alphas, angles = self.rounting_func(x)

    # rotate weight
    # # [Cout, Cin, k, k] --> [bs * Cout, Cin, k, k]
    rotated_weight = self.rotate_func(self.weight, alphas, angles)
    rotated_weight = rotated_weight.to(x.dtype)

After debugging, I found that this code is more time-consuming, and the operation of generating the rotation matrix and adding the rotation angle to the weights is much more time-consuming than the convolution itself, please ask if there is a problem with my usage method, or is there any better way to speed up my training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use for yolo #23

use for yolo #23

liu15509348793 commented Oct 8, 2024

yifanpu001 commented Oct 8, 2024

liu15509348793 commented Oct 9, 2024

yifanpu001 commented Oct 9, 2024

liu15509348793 commented Oct 9, 2024

yifanpu001 commented Oct 9, 2024

liu15509348793 commented Oct 10, 2024

liu15509348793 commented Oct 13, 2024

liu15509348793 commented Oct 13, 2024

use for yolo #23

use for yolo #23

Comments

liu15509348793 commented Oct 8, 2024

yifanpu001 commented Oct 8, 2024

liu15509348793 commented Oct 9, 2024

yifanpu001 commented Oct 9, 2024

liu15509348793 commented Oct 9, 2024

yifanpu001 commented Oct 9, 2024

liu15509348793 commented Oct 10, 2024

YOLOv8.0n backbone

[from, repeats, module, args]

liu15509348793 commented Oct 13, 2024

liu15509348793 commented Oct 13, 2024