Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use for yolo #23

Open
liu15509348793 opened this issue Oct 8, 2024 · 8 comments
Open

use for yolo #23

liu15509348793 opened this issue Oct 8, 2024 · 8 comments

Comments

@liu15509348793
Copy link

Hello, thank you for your work, can ARC replace 11 convolution? It seems that only 33 convolutions can be replaced, and in addition, it seems to be particularly slow, is this normal? This is the issue I had when I tried to replace normal convolutions with ARC in yolo

@yifanpu001
Copy link
Collaborator

As mentioned in Sec. 3.4 of the paper, after rotating 1 x 1 convolution is the same as before. So we do not rotate 1 x 1 conv.

@liu15509348793
Copy link
Author

As mentioned in Sec. 3.4 of the paper, after rotating 1 x 1 convolution is the same as before. So we do not rotate 1 x 1 conv.

Okay, thanks, is it normal for it to run very slowly? It feels several times slower than normal convolution

@yifanpu001
Copy link
Collaborator

I think it depends on your hardware device and software version. We have test our model on RTX 3090 with torch.compile, ARC-ResNet50 have almost the same inference speed as ResNet50. I guess the inference speed would be slower on memory-bounded devices because ARC module will have more memory access compared with standard conv.

@liu15509348793
Copy link
Author

我认为这取决于您的硬件设备和软件版本。我们在 RTX 3090 上使用 torch.compile 测试了我们的模型,ARC-ResNet50 的推理速度与 ResNet50 几乎相同。我猜在内存受限的设备上推理速度会更慢,因为与标准 conv 相比,ARC 模块将具有更多的内存访问。

I'm using A100, which should be enough memory, but when I replace the convolution in yolo with ARC, it's three times slower, for some reason

@yifanpu001
Copy link
Collaborator

which yolo model do you use? how do you apply arc module in your model? do you use accelerator like triton or tensorrt?

@liu15509348793
Copy link
Author

which yolo model do you use? how do you apply arc module in your model? do you use accelerator like triton or tensorrt?

I'm using yolov8 and I'm simply replacing the 3*3 convolution with ARC like this:

YOLOv8.0n backbone

backbone:

[from, repeats, module, args]

  • [-1, 1, AdaptiveRotatedConv2d, [64, 3, 2]] # 0-P1/2
  • [-1, 1, AdaptiveRotatedConv2d, [128, 3, 2]] # 1-P2/4
  • [-1, 3, C2f, [128, True]]
  • [-1, 1, AdaptiveRotatedConv2d, [256, 3, 2]] # 3-P3/8
  • [-1, 6, C2f, [256, True]]
  • [-1, 1, AdaptiveRotatedConv2d, [512, 3, 2]] # 5-P4/16
  • [-1, 6, C2f, [512, True]]
  • [-1, 1, AdaptiveRotatedConv2d, [1024, 3, 2]] # 7-P5/32
  • [-1, 3, C2f, [1024, True]]
  • [-1, 1, SPPF, [1024, 5]] # 9

@liu15509348793
Copy link
Author

Module: ultralytics.nn.modules.conv.Conv, Time: 0.009410 seconds
Module: ultralytics.nn.modules.conv.Conv, Time: 0.005618 seconds
Module: ultralytics.nn.modules.block.C3k2, Time: 0.021514 seconds
Module: ultralytics.nn.modules.conv.Conv, Time: 0.005827 seconds
Module: ultralytics.nn.modules.block.C3k2, Time: 0.020888 seconds
Module: ultralytics.nn.modules.conv.Conv, Time: 0.006252 seconds
Module: ultralytics.nn.modules.block.C3k2, Time: 0.005459 seconds
Module: ultralytics.nn.modules.conv.Conv, Time: 0.005606 seconds
Module: ultralytics.nn.modules.block.C3k2, Time: 0.005005 seconds

Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.062141 seconds
Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.015157 seconds
Module: ultralytics.nn.modules.block.C3k2_ARC, Time: 0.020658 seconds
Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.014282 seconds
Module: ultralytics.nn.modules.block.C3k2_ARC, Time: 0.017815 seconds
Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.015889 seconds
Module: ultralytics.nn.modules.block.C3k2_ARC, Time: 0.010974 seconds
Module: ultralytics.nn.arc.adaptive_rotated_conv.AdaptiveRotatedConv2d, Time: 0.018189 seconds
Module: ultralytics.nn.modules.block.C3k2_ARC, Time: 0.014755 seconds

Here are my latest test results on yolo11, I output the running time of each module, the top is the original module, the bottom is the module that replaced ARC, my experiment was done on the A100, but its speed is several times different, what is the reason for this?

@liu15509348793
Copy link
Author

    # get alphas, angles
    # # [bs, Cin, h, w] --> [bs, n_theta], [bs, n_theta]
    alphas, angles = self.rounting_func(x)

    # rotate weight
    # # [Cout, Cin, k, k] --> [bs * Cout, Cin, k, k]
    rotated_weight = self.rotate_func(self.weight, alphas, angles)
    rotated_weight = rotated_weight.to(x.dtype)

After debugging, I found that this code is more time-consuming, and the operation of generating the rotation matrix and adding the rotation angle to the weights is much more time-consuming than the convolution itself, please ask if there is a problem with my usage method, or is there any better way to speed up my training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants