Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Diag Forward #3510

Open
wants to merge 31 commits into
base: develop
Choose a base branch
from
Open

Implement Diag Forward #3510

wants to merge 31 commits into from

Conversation

cognaiger9
Copy link
Collaborator

@cognaiger9 cognaiger9 commented Feb 14, 2025

  • Added Diag Forward operations
  • Added driver test and gtest for Diag operations

The kernel is only 20% faster than ROCm if the following constraints are applied:

  • tensor dim num = 2.
  • number of elements in input tensor > 4096576

Detail Benchmark

float16
Ops name dtype size contiguous diagonal direction ROCm MIOpen Improvement
Diag float16 [9016 4048] contiguous -50 fwd 7808 6026 1.30
Diag float16 [9016 4048] noncontiguous -50 fwd 8560 6026 1.42
Diag float16 [9016 4048] contiguous 0 fwd 7280 6026 1.21
Diag float16 [9016 4048] noncontiguous 0 fwd 8048 5991 1.34
Diag float16 [9016 9016] contiguous -50 fwd 10112 6381 1.58
Diag float16 [9016 9016] noncontiguous -50 fwd 10144 6470 1.57
Diag float16 [9016 9016] contiguous 0 fwd 10464 6399 1.64
Diag float16 [9016 9016] noncontiguous 0 fwd 10512 6452 1.63
Diag float16 [18132 9016] contiguous -50 fwd 10608 6416 1.65
Diag float16 [18132 9016] noncontiguous -50 fwd 12768 6452 1.98
Diag float16 [18132 9016] contiguous 0 fwd 10368 6381 1.62
Diag float16 [18132 9016] noncontiguous 0 fwd 12384 6363 1.95
float32
Ops name dtype size contiguous diagonal direction ROCm MIOpen Improvement
Diag float32 [9016 4048] contiguous -50 fwd 8288 5937 1.40
Diag float32 [9016 4048] noncontiguous -50 fwd 9888 5920 1.67
Diag float32 [9016 4048] contiguous 0 fwd 7856 5991 1.31
Diag float32 [9016 4048] noncontiguous 0 fwd 9728 5849 1.66
Diag float32 [9016 9016] contiguous -50 fwd 13952 6523 2.14
Diag float32 [9016 9016] noncontiguous -50 fwd 13280 6434 2.06
Diag float32 [9016 9016] contiguous 0 fwd 14048 6666 2.11
Diag float32 [9016 9016] noncontiguous 0 fwd 14064 6523 2.16
Diag float32 [18132 9016] contiguous -50 fwd 14160 6523 2.17
Diag float32 [18132 9016] noncontiguous -50 fwd 17184 6399 2.69
Diag float32 [18132 9016] contiguous 0 fwd 13408 6541 2.05
Diag float32 [18132 9016] noncontiguous 0 fwd 16576 6470 2.56
Diag float32 [36264 18032] contiguous -50 fwd 19504 11057 1.76
Diag float32 [36264 18032] noncontiguous -50 fwd 35632 13492 2.64
Diag float32 [36264 18032] contiguous 0 fwd 19552 7484 2.61
Diag float32 [36264 18032] noncontiguous 0 fwd 39248 13493 2.91
bfloat16
Ops name dtype size contiguous diagonal direction ROCm MIOpen Improvement
Diag bfloat16 [9016 4048] contiguous 0 fwd 7040 6097 1.15
Diag bfloat16 [9016 4048] noncontiguous 0 fwd 7904 6471 1.22
Diag bfloat16 [9016 4048] contiguous 50 fwd 7136 5990 1.19
Diag bfloat16 [9016 4048] noncontiguous 50 fwd 8064 5794 1.39
Diag bfloat16 [9016 9016] contiguous 0 fwd 10320 6452 1.60
Diag bfloat16 [9016 9016] noncontiguous 0 fwd 10208 6594 1.55
Diag bfloat16 [9016 9016] contiguous 50 fwd 10384 6416 1.62
Diag bfloat16 [9016 9016] noncontiguous 50 fwd 10272 6523 1.57
Diag bfloat16 [18132 9016] contiguous 0 fwd 10416 6399 1.63
Diag bfloat16 [18132 9016] noncontiguous 0 fwd 12784 6417 1.99
Diag bfloat16 [18132 9016] contiguous 50 fwd 10608 6364 1.67
Diag bfloat16 [18132 9016] noncontiguous 50 fwd 12304 6381 1.93
Diag bfloat16 [36264 18032] contiguous 0 fwd 18048 7360 2.45
Diag bfloat16 [36264 18032] noncontiguous 0 fwd 24224 7288 3.32
Diag bfloat16 [36264 18032] contiguous 50 fwd 17248 7288 2.37
Diag bfloat16 [36264 18032] noncontiguous 50 fwd 24416 7271 3.36

Average performance:

fwd
float16 1.57
float32 2.12
bfloat16 1.88

@cognaiger9 cognaiger9 changed the title Implement Diag Implement Diag Forward Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant