Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Trace #3534

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Open

Implement Trace #3534

wants to merge 8 commits into from

Conversation

cognaiger9
Copy link
Collaborator

  • Add Trace operation with forward and backward kernels.
  • Add driver and gtest for kernels.

Average improvement over ROCm

type fwd bwd
float16 2.35 3.07
float 3.26 3.3
bfloat16 2.44 3.84

Detail Benchmark

float16 (forward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace float16 [34 4] contiguous fwd 15808 6080 2.60
Trace float16 [98 4] contiguous fwd 15824 6009 2.63
Trace float16 [190 4] contiguous fwd 13008 6080 2.14
Trace float16 [249 128] contiguous fwd 12464 7733 1.61
Trace float16 [349 222] contiguous fwd 15168 7840 1.93
Trace float16 [451 128] contiguous fwd 11888 7662 1.55
Trace float16 [2048 20480] contiguous fwd 31967 22952 1.39
Trace float16 [4096 45960] contiguous fwd 65006 25352 2.56
Trace float16 [8192 8192] contiguous fwd 47343 23130 2.05
Trace float16 [16384 16384] contiguous fwd 118285 23521 5.03
float16 (backward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace float16 [34 4] contiguous bwd 92958 24766 3.75
Trace float16 [98 4] contiguous bwd 99566 29033 3.43
Trace float16 [190 4] contiguous bwd 107310 26544 4.04
Trace float16 [249 128] contiguous bwd 107614 24979 4.31
Trace float16 [349 222] contiguous bwd 67825 23966 2.83
Trace float16 [451 128] contiguous bwd 65054 27468 2.37
Trace float16 [603 546] contiguous bwd 74990 30757 2.44
Trace float16 [1024 10240] contiguous bwd 105630 67133 1.57
Trace float16 [1024 1024] contiguous bwd 90894 25708 3.54
Trace float16 [2048 2048] contiguous bwd 91998 37726 2.44
float32 (forward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace float32 [34 4] contiguous fwd 15936 5653 2.82
Trace float32 [98 4] contiguous fwd 14784 5796 2.55
Trace float32 [190 4] contiguous fwd 13808 5458 2.53
Trace float32 [249 128] contiguous fwd 12576 7556 1.66
Trace float32 [349 222] contiguous fwd 14944 7520 1.99
Trace float32 [451 128] contiguous fwd 10736 7840 1.37
Trace float32 [2048 20480] contiguous fwd 46959 25761 1.82
Trace float32 [4096 45960] contiguous fwd 79550 21192 3.75
Trace float32 [4096 4096] contiguous fwd 28544 22721 1.26
Trace float32 [8192 8192] contiguous fwd 68670 22792 3.01
Trace float32 [16384 16384] contiguous fwd 208124 21067 9.88
float32 (backward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace float32 [34 4] contiguous bwd 98350 24090 4.08
Trace float32 [98 4] contiguous bwd 97083 22277 4.36
Trace float32 [190 4] contiguous bwd 97902 23574 4.15
Trace float32 [249 128] contiguous bwd 98942 20908 4.73
Trace float32 [349 222] contiguous bwd 72878 24890 2.93
Trace float32 [451 128] contiguous bwd 63966 28748 2.23
Trace float32 [603 546] contiguous bwd 73470 27468 2.67
Trace float32 [1024 10240] contiguous bwd 157788 67115 2.35
Trace float32 [1024 1024] contiguous bwd 106190 30864 3.44
Trace float32 [2048 2048] contiguous bwd 148285 37140 3.99
Trace float32 [4096 4096] contiguous bwd 144429 99188 1.46
bfloat16 (forward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace bfloat16 [34 4] contiguous fwd 16368 5902 2.77
Trace bfloat16 [98 4] contiguous fwd 16304 6382 2.55
Trace bfloat16 [190 4] contiguous fwd 13152 5867 2.24
Trace bfloat16 [249 128] contiguous fwd 13760 7662 1.80
Trace bfloat16 [349 222] contiguous fwd 13536 7698 1.76
Trace bfloat16 [451 128] contiguous fwd 12512 8071 1.55
Trace bfloat16 [2048 20480] contiguous fwd 32127 21281 1.51
Trace bfloat16 [4096 45960] contiguous fwd 65182 22970 2.84
Trace bfloat16 [8192 8192] contiguous fwd 47471 23539 2.02
Trace bfloat16 [16384 16384] contiguous fwd 119613 22490 5.32
bfloat16 (backward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace bfloat16 [34 4] contiguous bwd 131661 25566 5.15
Trace bfloat16 [98 4] contiguous bwd 123069 23930 5.14
Trace bfloat16 [190 4] contiguous bwd 130157 22704 5.73
Trace bfloat16 [249 128] contiguous bwd 119470 24766 4.82
Trace bfloat16 [349 222] contiguous bwd 85134 24641 3.45
Trace bfloat16 [451 128] contiguous bwd 82382 23610 3.49
Trace bfloat16 [603 546] contiguous bwd 84830 33762 2.51
Trace bfloat16 [1024 10240] contiguous bwd 103645 68466 1.51
Trace bfloat16 [1024 1024] contiguous bwd 97166 25548 3.80
Trace bfloat16 [2048 2048] contiguous bwd 107102 38651 2.77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant