[BACKEND] Implement generic swizzling when lowering `convert_layout` #6982

lezcano · 2025-05-29T16:29:17Z

We implement a generic swizzling algorithm by @apgoucher that, given two linear layouts, finds the optimal shared memory layout that maximises read/write vectorisation and, provided that, minimises bank conflicts.

We also implement an algorithm to find the minimum tile size necessary to perform the convert_layout given the restrictions above, and we use it to perform the convert_layout iteratively.

This PR does not yet implement a lowering to ldmatrix/stmatrix, we'll do that in a future PR.

lezcano · 2025-05-29T21:42:36Z

I'll run benchmarks and do a couple minor clean-ups tomorrow. Will also add a couple lit tests, although there is already one for the fp8 transpose which shows that we can indeed vectorise it.

lib/Tools/GenericSwizzling.cpp

Jokeren · 2025-05-31T14:58:21Z

test/Analysis/test-allocation.mlir

@@ -68,7 +68,7 @@ tt.func @matmul_loop(%lb : index, %ub : index, %step : index, %A : !tt.ptr<f16>,

 // Shared memory is available after a tensor's liveness range ends
 // expected-remark @below {{reusable}}
-// expected-remark @below {{size = 4608}}
+// expected-remark @below {{size = 8192}}


Seems like shared memory usage has been increased a lot

these often come from being able to vectorise more than before (and as such, not being abl eto do so many reps).

include/triton/Tools/LayoutUtils.h

include/triton/Tools/GenericSwizzling.h

Jokeren · 2025-05-31T15:05:25Z

lib/Analysis/Allocation.cpp

+  return smem.getTotalOutDimSize() / reps;
+}
+
+static unsigned getNumScratchElemsPaddedCvt(RankedTensorType srcTy,


Is it there only for the isStMatrix case?

lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM.cpp

lib/Tools/GenericSwizzling.cpp

Jokeren · 2025-05-31T16:44:02Z

lib/Tools/GenericSwizzling.cpp

+  auto logBankConflicts = std::min<int32_t>(
+      std::max<int32_t>(0, lenSegment - A.size() - segment.size()), A.size());
+  // Conflict-free
+  for (int i = logBankConflicts; i < A.size(); ++i)


This ^ operator here isn't clear to me, but we can chat offline

this part is in the explanation of the algorithm in the paper, but yes, I agree it is quite a tricky part

lezcano requested review from Jokeren and ptillet as code owners May 29, 2025 16:29

lezcano marked this pull request as draft May 29, 2025 16:30

lezcano changed the title ~~[BACKEND][DNR] Implement generic swizzling when lowering convert_layout~~ [BACKEND][WIP] Implement generic swizzling when lowering convert_layout May 29, 2025

lezcano closed this May 29, 2025

lezcano reopened this May 29, 2025

lezcano marked this pull request as ready for review May 29, 2025 21:38

lezcano changed the title ~~[BACKEND][WIP] Implement generic swizzling when lowering convert_layout~~ [BACKEND] Implement generic swizzling when lowering convert_layout May 29, 2025

lezcano requested a review from ThomasRaoux May 29, 2025 21:38

Jokeren reviewed May 29, 2025

View reviewed changes

lib/Tools/GenericSwizzling.cpp Outdated Show resolved Hide resolved

apgoucher approved these changes May 30, 2025

View reviewed changes

lezcano added 14 commits May 30, 2025 10:49

Compiles

097a978

kill prints

869abeb

Init implementation of reps

b1072ac

Implement reps. Missing divideRight

928996e

Implement divideRight with tests

120bff5

Fix reps computation

bac04bc

fixes

2e423e7

Update lit tests & support pointers and bitwidth 64

18193ef

allow subbyte and leave note about small cvts

aeda7da

Do not pack sub-byte width types, upcast them to u8 first

38f3f32

oops

a59532a

shadowed variable...

bcf434c

address review

5fcc7a4

Don't swizzle if there are bank conflicts

ab418f9

Jokeren reviewed May 31, 2025

View reviewed changes

address reviews

ac6af1e

lezcano force-pushed the cvt_ldst branch from 41934fa to ac6af1e Compare June 1, 2025 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BACKEND] Implement generic swizzling when lowering `convert_layout` #6982

[BACKEND] Implement generic swizzling when lowering `convert_layout` #6982

Uh oh!

lezcano commented May 29, 2025 •

edited

Loading

Uh oh!

lezcano commented May 29, 2025

Uh oh!

Uh oh!

Jokeren May 31, 2025

Uh oh!

lezcano Jun 1, 2025

Uh oh!

Uh oh!

Uh oh!

Jokeren May 31, 2025

Uh oh!

lezcano Jun 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jokeren May 31, 2025

Uh oh!

lezcano Jun 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

[BACKEND] Implement generic swizzling when lowering convert_layout #6982

Are you sure you want to change the base?

[BACKEND] Implement generic swizzling when lowering convert_layout #6982

Uh oh!

Conversation

lezcano commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lezcano commented May 29, 2025

Uh oh!

Uh oh!

Jokeren May 31, 2025

Choose a reason for hiding this comment

Uh oh!

lezcano Jun 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jokeren May 31, 2025

Choose a reason for hiding this comment

Uh oh!

lezcano Jun 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jokeren May 31, 2025

Choose a reason for hiding this comment

Uh oh!

lezcano Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[BACKEND] Implement generic swizzling when lowering `convert_layout` #6982

[BACKEND] Implement generic swizzling when lowering `convert_layout` #6982

lezcano commented May 29, 2025 •

edited

Loading

lezcano Jun 1, 2025 •

edited

Loading