[AMD] Fix uniform offset computation #4678

giuseros · 2024-09-09T11:36:10Z

There was a bug in how we were splitting the uniform/non-uniform offset contribution for addptr.

Consider this IR (where U is a uniform value, e.g., , coming from a splat and NU is non-uniform, coming e.g., from a make_range).

%a = %U+%NU
%b = %a + %NU
%c = addptr %ptr, %b

It would have been rewritten to

%b = %NU+%NU
%basePtr = addptr %basePtrOld, %U
%c = addi %offset, %b

The main issue here is that %b's operand #0 has changed, i.e., the scalar contribution has been removed. This is fine if addptr is the only operation that uses %b. If any other operation uses %b, they need the "old" %b.

The solution is to accumulate both the uniform and non-uniform contributions in a separate IR and leave the original %b untouched. Possible duplications will be removed by the canonicalizer .

Doing things in this way, I also could generalize the pass to all expressions of the form (U+NU)*(U+NU).

I tried enabling this pass and running all the suite and it is working fine

antiagainst

Nice. This is much better than mutating directly; thanks! I just have a few small nits.

third_party/amd/lib/TritonAMDGPUTransforms/CanonicalizePointers.cpp

antiagainst

I addressed the small nits given I need to retrigger the CI after some network issues anyway.

Fix uniform offset computation

b0e4323

giuseros requested review from antiagainst, zhanglx13 and ptillet as code owners September 9, 2024 11:36

giuseros mentioned this pull request Sep 9, 2024

[AMD] Enable masked load and pointer canonicalization pass #4638

Merged

antiagainst requested changes Sep 9, 2024

View reviewed changes

antiagainst changed the title ~~Fix uniform offset computation~~ [AMD] Fix uniform offset computation Sep 9, 2024

antiagainst added 2 commits September 9, 2024 21:23

Fix a few style issues

40577cc

Merge remote-tracking branch 'origin/main' into fix_canonicalize_bug

c6d90b8

antiagainst approved these changes Sep 9, 2024

View reviewed changes

antiagainst merged commit 25324a7 into triton-lang:main Sep 9, 2024
7 checks passed

lezcano mentioned this pull request Sep 24, 2024

Implement scaled_dot(mxfp8, fp8) via mma #4795

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Fix uniform offset computation #4678

[AMD] Fix uniform offset computation #4678

giuseros commented Sep 9, 2024

antiagainst left a comment

antiagainst left a comment

[AMD] Fix uniform offset computation #4678

[AMD] Fix uniform offset computation #4678

Conversation

giuseros commented Sep 9, 2024

antiagainst left a comment

Choose a reason for hiding this comment

antiagainst left a comment

Choose a reason for hiding this comment