Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set localsize as subgroup size #2483

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nihui
Copy link
Member

@nihui nihui commented Dec 21, 2020

No description provided.

Copy link
Contributor

@monkeyking monkeyking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

for cpu
apple silicon m1
left -> without this patch
right -> with this patch

@monkeyking
Copy link
Contributor

image

for gpu
apple silicon m1
left -> without this patch
right -> with this patch

@cavalleria
Copy link
Contributor

device: vivo iQOO Pro 5G Snapdragon 855
without this patch
WechatIMG41
with patch
WechatIMG42
cpu
WechatIMG40

@codecov-io
Copy link

codecov-io commented Dec 26, 2020

Codecov Report

Merging #2483 (4c3fbb5) into master (f47fbcb) will increase coverage by 0.61%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2483      +/-   ##
==========================================
+ Coverage   87.63%   88.25%   +0.61%     
==========================================
  Files         478      220     -258     
  Lines       92160    56216   -35944     
==========================================
- Hits        80766    49611   -31155     
+ Misses      11394     6605    -4789     
Impacted Files Coverage Δ
src/layer/requantize.cpp 17.58% <0.00%> (-26.38%) ⬇️
src/layer/dequantize.cpp 20.28% <0.00%> (-26.09%) ⬇️
src/allocator.cpp 70.79% <0.00%> (-5.76%) ⬇️
src/layer/crop.cpp 79.09% <0.00%> (-3.54%) ⬇️
src/mat.h 88.58% <0.00%> (-2.02%) ⬇️
src/cpu.cpp 56.83% <0.00%> (-1.51%) ⬇️
src/allocator.h 90.00% <0.00%> (-0.91%) ⬇️
src/layer/padding.cpp 69.45% <0.00%> (-0.73%) ⬇️
src/paramdict.cpp 49.70% <0.00%> (-0.30%) ⬇️
src/option.cpp 100.00% <0.00%> (ø)
... and 262 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f47fbcb...4c3fbb5. Read the comment docs.

@zchrissirhcz
Copy link
Contributor

zchrissirhcz commented Dec 26, 2020

device: 小米8, SnapDragon 845

without this patch, means using tencent/ncnn, 25506cf

compile: ndk-r21b, android arm64, release mode

(Previously the GPU benchmark result was wrong. Now updated. @nihui )

GPU result

without this patch

dipper:/data/pixel # ./benchncnn 4 1 0 0 1
[0 Adreno (TM) 630]  queueC=0[3]  queueG=0[3]  queueT=0[3]
[0 Adreno (TM) 630]  bugsbn1=0  bugcopc=0  bugihfa=1
[0 Adreno (TM) 630]  fp16p=1  fp16s=0  fp16a=1  int8s=0  int8a=0
[0 Adreno (TM) 630]  subgroup=64  basic=1  vote=1  ballot=0  shuffle=0
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = 0
cooling_down = 1
          squeezenet  min =   34.88  max =   43.62  avg =   37.28
     squeezenet_int8  min =   35.79  max =   36.29  avg =   36.02
           mobilenet  min =   19.58  max =   19.75  avg =   19.66
      mobilenet_int8  min =   74.30  max =   74.64  avg =   74.52
        mobilenet_v2  min =   39.37  max =   40.94  avg =   40.09
        mobilenet_v3  min =   45.98  max =   48.61  avg =   47.19
          shufflenet  min =   26.38  max =   33.67  avg =   29.20
       shufflenet_v2  min =   32.89  max =   33.92  avg =   33.31
             mnasnet  min =   43.99  max =   44.83  avg =   44.47
     proxylessnasnet  min =   49.11  max =   49.37  avg =   49.28
     efficientnet_b0  min =   33.43  max =   34.00  avg =   33.63
        regnety_400m  min =   24.62  max =   25.58  avg =   25.14
           blazeface  min =   12.51  max =   16.86  avg =   15.01
           googlenet  min =   44.96  max =   46.36  avg =   45.63
      googlenet_int8  min =  142.52  max =  143.08  avg =  142.77
            resnet18  min =   38.11  max =   38.48  avg =   38.27
       resnet18_int8  min =  113.07  max =  113.36  avg =  113.20
             alexnet  min =  104.26  max =  107.29  avg =  106.36
               vgg16  min =  355.68  max =  365.14  avg =  359.93
          vgg16_int8  min =  653.95  max =  655.73  avg =  654.62
            resnet50  min =   89.33  max =   89.62  avg =   89.42
       resnet50_int8  min =  258.52  max =  259.30  avg =  258.95
      squeezenet_ssd  min =   79.26  max =   81.12  avg =   79.93
 squeezenet_ssd_int8  min =   90.63  max =   91.12  avg =   90.96
       mobilenet_ssd  min =   47.66  max =   48.79  avg =   48.37
  mobilenet_ssd_int8  min =  121.09  max =  121.67  avg =  121.43
      mobilenet_yolo  min =   83.42  max =   93.79  avg =   90.93
  mobilenetv2_yolov3  min =   79.52  max =   80.46  avg =   80.08
         yolov4-tiny  min =  118.95  max =  122.30  avg =  120.16

with this patch

dipper:/data/pixel # ./benchncnn-patch 4 1 0 0 1
[0 Adreno (TM) 630]  queueC=0[3]  queueG=0[3]  queueT=0[3]
[0 Adreno (TM) 630]  bugsbn1=0  bugcopc=0  bugihfa=1
[0 Adreno (TM) 630]  fp16p=1  fp16s=0  fp16a=1  int8s=0  int8a=0
[0 Adreno (TM) 630]  subgroup=64  basic=1  vote=1  ballot=0  shuffle=0
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = 0
cooling_down = 1
          squeezenet  min =   37.57  max =   39.05  avg =   38.24
     squeezenet_int8  min =   35.98  max =   36.45  avg =   36.23
           mobilenet  min =   19.05  max =   19.89  avg =   19.62
      mobilenet_int8  min =   75.05  max =   75.23  avg =   75.16
        mobilenet_v2  min =   35.27  max =   38.33  avg =   37.12
        mobilenet_v3  min =   46.79  max =   51.23  avg =   49.20
          shufflenet  min =   34.38  max =   38.46  avg =   35.81
       shufflenet_v2  min =   34.57  max =   41.21  avg =   39.29
             mnasnet  min =   41.65  max =   54.24  avg =   46.39
     proxylessnasnet  min =   54.64  max =   57.41  avg =   55.66
     efficientnet_b0  min =   32.94  max =   35.36  avg =   34.11
        regnety_400m  min =   24.47  max =   25.82  avg =   25.14
           blazeface  min =   15.69  max =   17.06  avg =   16.12
           googlenet  min =   45.42  max =   46.80  avg =   45.87
      googlenet_int8  min =  142.79  max =  143.77  avg =  143.19
            resnet18  min =   38.87  max =   39.53  avg =   39.21
       resnet18_int8  min =  113.20  max =  113.54  avg =  113.37
             alexnet  min =  102.20  max =  103.49  avg =  102.93
               vgg16  min =  350.30  max =  358.39  avg =  353.06
          vgg16_int8  min =  652.70  max =  655.18  avg =  653.65
            resnet50  min =   88.65  max =   96.95  avg =   90.93
       resnet50_int8  min =  257.54  max =  258.05  avg =  257.89
      squeezenet_ssd  min =   65.74  max =   82.69  avg =   77.16
 squeezenet_ssd_int8  min =   90.40  max =   91.81  avg =   91.05
       mobilenet_ssd  min =   51.38  max =   52.01  avg =   51.83
  mobilenet_ssd_int8  min =  121.35  max =  122.21  avg =  121.71
      mobilenet_yolo  min =   92.93  max =   94.01  avg =   93.46
  mobilenetv2_yolov3  min =   76.35  max =   82.29  avg =   79.69
         yolov4-tiny  min =  119.77  max =  120.89  avg =  120.36

cpu result

without this patch

dipper:/data/pixel # ./benchncnn  4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   21.97  max =   22.06  avg =   22.03
     squeezenet_int8  min =   35.34  max =   35.67  avg =   35.45
           mobilenet  min =   33.24  max =   33.43  avg =   33.34
      mobilenet_int8  min =   74.43  max =   74.85  avg =   74.71
        mobilenet_v2  min =   22.21  max =   22.98  avg =   22.48
        mobilenet_v3  min =   18.33  max =   18.48  avg =   18.42
          shufflenet  min =   12.25  max =   12.42  avg =   12.33
       shufflenet_v2  min =   12.88  max =   13.03  avg =   12.95
             mnasnet  min =   21.36  max =   21.58  avg =   21.48
     proxylessnasnet  min =   24.86  max =   25.57  avg =   25.12
     efficientnet_b0  min =   50.46  max =   51.00  avg =   50.75
        regnety_400m  min =   31.85  max =   32.05  avg =   31.95
           blazeface  min =    5.67  max =    5.92  avg =    5.83
           googlenet  min =   99.35  max =  100.40  avg =   99.68
      googlenet_int8  min =  141.38  max =  142.55  avg =  141.86
            resnet18  min =   77.94  max =   79.26  avg =   78.77
       resnet18_int8  min =  132.99  max =  133.94  avg =  133.38
             alexnet  min =   77.82  max =   78.12  avg =   77.94
               vgg16  min =  403.46  max =  405.19  avg =  404.22
          vgg16_int8  min =  674.93  max =  740.70  avg =  707.83
            resnet50  min =  185.49  max =  189.83  avg =  187.08
       resnet50_int8  min =  257.94  max =  258.82  avg =  258.27
      squeezenet_ssd  min =   75.35  max =   76.14  avg =   75.74
 squeezenet_ssd_int8  min =   90.09  max =   90.51  avg =   90.27
       mobilenet_ssd  min =   80.02  max =   81.00  avg =   80.42
  mobilenet_ssd_int8  min =  120.70  max =  121.08  avg =  120.88
      mobilenet_yolo  min =  150.72  max =  151.86  avg =  151.09
  mobilenetv2_yolov3  min =   80.38  max =   81.07  avg =   80.86
         yolov4-tiny  min =  127.19  max =  128.91  avg =  128.01

with this patch

dipper:/data/pixel # ./benchncnn-patch   4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   22.05  max =   22.17  avg =   22.09
     squeezenet_int8  min =   35.25  max =   36.27  avg =   35.63
           mobilenet  min =   33.31  max =   34.18  avg =   33.65
      mobilenet_int8  min =   74.73  max =   74.88  avg =   74.81
        mobilenet_v2  min =   22.33  max =   22.60  avg =   22.43
        mobilenet_v3  min =   18.43  max =   18.70  avg =   18.59
          shufflenet  min =   12.37  max =   12.66  avg =   12.51
       shufflenet_v2  min =   12.94  max =   13.39  avg =   13.17
             mnasnet  min =   21.40  max =   21.74  avg =   21.58
     proxylessnasnet  min =   24.80  max =   25.36  avg =   25.07
     efficientnet_b0  min =   50.81  max =   50.97  avg =   50.90
        regnety_400m  min =   31.70  max =   32.45  avg =   31.98
           blazeface  min =    5.77  max =    5.95  avg =    5.88
           googlenet  min =   99.37  max =  100.41  avg =   99.94
      googlenet_int8  min =  142.13  max =  142.79  avg =  142.45
            resnet18  min =   75.30  max =   76.03  avg =   75.77
       resnet18_int8  min =  112.94  max =  113.57  avg =  113.19
             alexnet  min =   77.88  max =   77.99  avg =   77.92
               vgg16  min =  403.07  max =  409.62  avg =  405.62
          vgg16_int8  min =  653.91  max =  670.21  avg =  661.02
            resnet50  min =  184.12  max =  186.48  avg =  185.74
       resnet50_int8  min =  259.21  max =  259.87  avg =  259.54
      squeezenet_ssd  min =   76.01  max =   76.79  avg =   76.47
 squeezenet_ssd_int8  min =   90.12  max =   90.86  avg =   90.52
       mobilenet_ssd  min =   79.95  max =   80.58  avg =   80.34
  mobilenet_ssd_int8  min =  120.72  max =  121.12  avg =  120.90
      mobilenet_yolo  min =  150.75  max =  153.95  avg =  151.98
  mobilenetv2_yolov3  min =   80.36  max =   81.94  avg =   81.11
         yolov4-tiny  min =  127.39  max =  128.96  avg =  128.12

src/gpu.cpp Outdated Show resolved Hide resolved
@nihui nihui closed this Oct 11, 2023
@nihui nihui reopened this Oct 11, 2023
@nihui nihui closed this Oct 11, 2023
@nihui nihui reopened this Oct 11, 2023
@nihui nihui closed this Oct 11, 2023
@nihui nihui reopened this Oct 11, 2023
@github-actions github-actions bot added the core label Oct 11, 2023
@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2023

Codecov Report

Merging #2483 (4c3fbb5) into master (f47fbcb) will increase coverage by 6.79%.
Report is 1146 commits behind head on master.
The diff coverage is 37.03%.

@@             Coverage Diff             @@
##           master    #2483       +/-   ##
===========================================
+ Coverage   87.63%   94.42%    +6.79%     
===========================================
  Files         478      787      +309     
  Lines       92160   264046   +171886     
===========================================
+ Hits        80766   249338   +168572     
- Misses      11394    14708     +3314     
Files Coverage Δ
src/gpu.cpp 81.66% <50.00%> (+5.55%) ⬆️
src/pipeline.cpp 52.08% <36.53%> (-19.53%) ⬇️

... and 927 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants