Enable swizzling SMEM for transposed dot operand #474

htyu · 2024-01-18T18:48:23Z

Transposed operand will be accessed in an opposite order from the original operand. Enabling swizzling seems to help performance. I'm seeing 10% performance improvement for our internal model.

zhanglx13 · 2024-01-23T17:35:35Z

include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td

@@ -131,6 +131,8 @@ compared to 1*64 when the hasLeadingOffset is false.

        if (mfmaEnc) {
          int kDimNum = dotOpEnc.getOpIdx() == 0 ? 1 : 0;
+          if (needTrans)


@htyu Is this for tt.trans?
In other cases, we don't set the needTrans field when creating a sharedLayout.

Yes, it's for tt.trans like below. When creating a shared encoding, it looks ahead to see if there's a transpose prior to the dot.

qk = tl.dot(q, tl.trans(k), allow_tf32=ALLOW_TF32) * alpha

I learned that tl.trans is a very tricky op.
Let me clarify. The 10% improvement is due to the enablement of swizzling?

Yes, this change speeds up our model that has above trans and dot operation by 10%.

BTW, tl.trans is used quite often in attention kernel of a transformer model, iiuc.

cc+ @vgokhale @scxiao

With this change, do we need a change related to lines at: https://github.com/htyu/triton/blob/ab3aafb4a9158a60a4c3085a13b605f70488d6ff/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td#L166-L167
We have scenario of non-square tile now.

no they should be good.
The changes here is only used to enable swizzling for operands of dot.
getMDim and getNDim are shapes of the result of dot.

Never mind, this is related to mfma layout

Transposed operand will be accessed in an opposite order from the original operand. Enabling swizzling seems to help performance. I'm seeing 10% performance improvement for our internal model. This is a backport of ROCm#474.

Enable swizzling SMEM for transposed operand

417358f

htyu changed the title ~~Enable swizzling SMEM for transposed operand~~ Enable swizzling SMEM for transposed dot operand Jan 18, 2024

Merge branch 'triton-mlir' into hoy/trans

bbe7679

alefimov-amd requested a review from zhanglx13 January 19, 2024 16:31

zhanglx13 reviewed Jan 23, 2024

View reviewed changes

zhanglx13 self-requested a review January 23, 2024 18:03

zhanglx13 approved these changes Jan 23, 2024

View reviewed changes

Merge branch 'triton-mlir' into hoy/trans

ab3aafb

zhanglx13 merged commit 6141b10 into ROCm:triton-mlir Jan 23, 2024
2 checks passed

htyu mentioned this pull request Apr 15, 2024

[BACKEND][AMD] Enable swizzling SMEM for transposed operand triton-lang/triton#3666

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable swizzling SMEM for transposed dot operand #474

Enable swizzling SMEM for transposed dot operand #474

htyu commented Jan 18, 2024 •

edited

Loading

zhanglx13 Jan 23, 2024

htyu Jan 23, 2024

zhanglx13 Jan 23, 2024

htyu Jan 23, 2024

htyu Jan 23, 2024 •

edited

Loading

zhanglx13 Jan 23, 2024

scxiao Jan 23, 2024

zhanglx13 Jan 23, 2024

scxiao Jan 23, 2024

Enable swizzling SMEM for transposed dot operand #474

Enable swizzling SMEM for transposed dot operand #474

Conversation

htyu commented Jan 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htyu Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htyu commented Jan 18, 2024 •

edited

Loading

htyu Jan 23, 2024 •

edited

Loading