Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CosineSimilarity #3538

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Open

Implement CosineSimilarity #3538

wants to merge 8 commits into from

Conversation

cognaiger9
Copy link
Collaborator

  • Add CosineSimilarity operation with forward and backward kernels.
  • Add driver and gtest for kernels.
  • MIOpen performs better if:
    • Number of output elements exceeds 20000

Average improvement over ROCm

type fwd bwd
float16 1.86 2.72
float 1.69 1.82
bfloat16 2.04 1.96

Detail Benchmark

float16
op_name dtype input1 input2 contiguous direction ROCm MIOpen MIOpen vs ROCm
Cosine Similarities float16 [100 512 200] [100 512 200] contiguous fwd 426253 347020 1.23
Cosine Similarities float16 [100 512 200] [100 512 200] contiguous bwd 1462341 947513 1.54
Cosine Similarities float16 [100 1024 200] [100 1024 200] contiguous fwd 798794 688528 1.16
Cosine Similarities float16 [100 1024 200] [100 1024 200] contiguous bwd 2784315 1897090 1.47
Cosine Similarities float16 [48 512 512] [48 512 512] contiguous fwd 515276 352034 1.46
Cosine Similarities float16 [48 512 512] [48 512 512] contiguous bwd 1786899 966217 1.85
Cosine Similarities float16 [48 512 512] [48 512 512] noncontiguous fwd 2821500 1338300 2.11
Cosine Similarities float16 [48 512 512] [48 512 512] noncontiguous bwd 6846559 4324440 1.58
Cosine Similarities float16 [48 1024 512] [48 1024 512] noncontiguous fwd 5630840 3726720 1.51
Cosine Similarities float16 [48 1024 512] [48 1024 512] noncontiguous bwd 14110539 10911400 1.29
Cosine Similarities float16 [48 2048 512] [48 2048 512] contiguous fwd 2693597 1422960 1.89
Cosine Similarities float16 [48 2048 512] [48 2048 512] contiguous bwd 7302252 3825210 1.91
Cosine Similarities float16 [48 2048 512] [48 2048 512] noncontiguous fwd 10481655 8927840 1.17
Cosine Similarities float16 [48 2048 512] [48 2048 512] noncontiguous bwd 33497829 27936900 1.20
Cosine Similarities float16 [8192 512 8] [8192 512 8] contiguous fwd 1855731 760218 2.44
Cosine Similarities float16 [8192 512 8] [8192 512 8] contiguous bwd 4881614 3379450 1.44
Cosine Similarities float16 [8192 512 8] [8192 512 8] noncontiguous fwd 3529784 890530 3.96
Cosine Similarities float16 [8192 512 8] [8192 512 8] noncontiguous bwd 10447912 3381180 3.09
Cosine Similarities float16 [8192 512 16] [8192 512 16] contiguous fwd 2646878 1552040 1.71
Cosine Similarities float16 [8192 512 16] [8192 512 16] contiguous bwd 8911075 4909360 1.82
float32
op_name dtype input1 input2 contiguous direction ROCm MIOpen MIOpen vs ROCm
Cosine Similarities float32 [100 512 200] [100 512 200] contiguous fwd 577659 370984 1.56
Cosine Similarities float32 [100 512 200] [100 512 200] contiguous bwd 1849378 1052470 1.76
Cosine Similarities float32 [100 1024 200] [100 1024 200] contiguous fwd 1136999 746626 1.52
Cosine Similarities float32 [100 1024 200] [100 1024 200] contiguous bwd 3618709 2147220 1.69
Cosine Similarities float32 [100 2048 200] [100 2048 200] contiguous fwd 2632380 1524860 1.73
Cosine Similarities float32 [100 2048 200] [100 2048 200] contiguous bwd 8504242 4356880 1.95
Cosine Similarities float32 [48 512 512] [48 512 512] contiguous fwd 837514 383944 2.18
Cosine Similarities float32 [48 512 512] [48 512 512] contiguous bwd 2348079 1042270 2.25
Cosine Similarities float32 [48 512 512] [48 512 512] noncontiguous fwd 3506695 1811310 1.94
Cosine Similarities float32 [48 512 512] [48 512 512] noncontiguous bwd 9166318 5483190 1.67
Cosine Similarities float32 [48 1024 512] [48 1024 512] contiguous fwd 1160312 738841 1.57
Cosine Similarities float32 [48 1024 512] [48 1024 512] contiguous bwd 4295953 2045450 2.10
Cosine Similarities float32 [48 1024 512] [48 1024 512] noncontiguous fwd 6700240 5319740 1.26
Cosine Similarities float32 [48 1024 512] [48 1024 512] noncontiguous bwd 19835299 14577400 1.36
Cosine Similarities float32 [48 2048 512] [48 2048 512] contiguous fwd 2119473 1474480 1.44
Cosine Similarities float32 [48 2048 512] [48 2048 512] contiguous bwd 8352629 4055320 2.06
Cosine Similarities float32 [48 2048 512] [48 2048 512] noncontiguous fwd 12393130 9287640 1.33
Cosine Similarities float32 [48 2048 512] [48 2048 512] noncontiguous bwd 45345268 31751600 1.43
Cosine Similarities float32 [8192 512 8] [8192 512 8] contiguous fwd 2172193 903117 2.41
Cosine Similarities float32 [8192 512 8] [8192 512 8] contiguous bwd 6099446 3222970 1.89
bfloat16
op_name dtype input1 input2 contiguous direction ROCm MIOpen MIOpen vs ROCm
Cosine Similarities bfloat16 [100 1024 200] [100 1024 200] contiguous fwd 846298 693346 1.22
Cosine Similarities bfloat16 [100 1024 200] [100 1024 200] contiguous bwd 2977722 1905840 1.56
Cosine Similarities bfloat16 [48 512 512] [48 512 512] contiguous fwd 520316 354896 1.47
Cosine Similarities bfloat16 [48 512 512] [48 512 512] contiguous bwd 1880802 1017110 1.85
Cosine Similarities bfloat16 [48 512 512] [48 512 512] noncontiguous fwd 2847387 1336790 2.13
Cosine Similarities bfloat16 [48 512 512] [48 512 512] noncontiguous bwd 6875854 4354790 1.58
Cosine Similarities bfloat16 [48 1024 512] [48 1024 512] contiguous fwd 850234 740263 1.15
Cosine Similarities bfloat16 [48 1024 512] [48 1024 512] contiguous bwd 3515447 2027460 1.73
Cosine Similarities bfloat16 [48 1024 512] [48 1024 512] noncontiguous fwd 6123076 3721010 1.65
Cosine Similarities bfloat16 [48 1024 512] [48 1024 512] noncontiguous bwd 14947030 13634400 1.10
Cosine Similarities bfloat16 [8192 512 8] [8192 512 8] contiguous fwd 1918707 745036 2.58
Cosine Similarities bfloat16 [8192 512 8] [8192 512 8] contiguous bwd 5203020 3372960 1.54
Cosine Similarities bfloat16 [8192 512 8] [8192 512 8] noncontiguous fwd 3599208 862476 4.17
Cosine Similarities bfloat16 [8192 512 8] [8192 512 8] noncontiguous bwd 11091124 3497940 3.17
Cosine Similarities bfloat16 [8192 512 16] [8192 512 16] contiguous fwd 2800925 1561570 1.79
Cosine Similarities bfloat16 [8192 512 16] [8192 512 16] contiguous bwd 9585856 4725360 2.03
Cosine Similarities bfloat16 [8192 512 32] [8192 512 32] contiguous fwd 4520643 2975230 1.52
Cosine Similarities bfloat16 [8192 512 32] [8192 512 32] contiguous bwd 18413546 8472450 2.17
Cosine Similarities bfloat16 [4096 512 8] [4096 512 8] contiguous fwd 1258264 469138 2.68
Cosine Similarities bfloat16 [4096 512 8] [4096 512 8] contiguous bwd 3423051 1197620 2.86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant