Skip to content

Commit

Permalink
[BACKEND][AMD] Enable swizzling SMEM for transposed operand (#3666)
Browse files Browse the repository at this point in the history
Transposed operand will be accessed in an opposite order from the
original operand. Enabling swizzling seems to help performance. I'm
seeing 10% performance improvement for our internal model.

This is a backport of ROCm#474.
  • Loading branch information
htyu authored Apr 15, 2024
1 parent 3657381 commit d117047
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,8 @@ compared to 1*64 when the hasLeadingOffset is false.
// ---- begin GFX908/GFX90A ----
if (auto mfmaEnc = dotOpEnc.getParent().dyn_cast<AMDMfmaEncodingAttr>()) {
int kDimNum = dotOpEnc.getOpIdx() == 0 ? 1 : 0;
if (needTrans)
kDimNum = 1 - kDimNum;
bool isKDimInner = (order[0] == kDimNum);
if (isKDimInner) {
const int numBanks = 32;
Expand Down

0 comments on commit d117047

Please sign in to comment.