[AllocsToSLM] Add thread-specific offsets #407

dchigarev · 2024-11-10T20:09:21Z

The proper way to use SLM via memref.alloc() appears to be allocating the SLM chunk for all threads in the workgroup at once, then slicing it for each thread. This PR implements this logic.

Example:

// Before the pass
func.func @entry() {
  gpu.launch blocks(%bx, %by, %bz) in (%sz_bx = %c8, %sz_by = %c8, %sz_bz = %c1)
             threads(%tx, %ty, %tz) in (%sz_tx = %c2, %sz_ty = %c4, %sz_tz = %c1) {
    %slm = memref.alloc() : memref<16x32xf16>
    gpu.terminator
  }
  return
}

// After the pass
func.func @entry() {
  gpu.launch blocks(%bx, %by, %bz) in (%sz_bx = %c8, %sz_by = %c8, %sz_bz = %c1)
             threads(%tx, %ty, %tz) in (%sz_tx = %c2, %sz_ty = %c4, %sz_tz = %c1) {
    // Scale allocation size by the number of threads in the work-group (16 * 2 = 32; 32 * 4 = 128)
    // This 'alloc' is only called once per work-group
    %slm_root = memref.alloc() : memref<32x128xf16, 3>
    // Compute the subview for each thread
    %slm = memref.subview %slm_root[%tx * 16, %ty * 32] [16, 32] : memref<32x128xf16, 3> to memref<16x32xf16, 3>
    gpu.terminator
  }
  return
}

Signed-off-by: dchigarev <[email protected]>

lib/gc/Transforms/GPU/AllocsToSLM.cpp

Signed-off-by: dchigarev <[email protected]>

[AllocsToSLM] Add thread-specific offsets

6a61a7e

Signed-off-by: dchigarev <[email protected]>

dchigarev marked this pull request as ready for review November 10, 2024 20:23

dchigarev requested review from AndreyPavlenko and kurapov-peter November 10, 2024 20:23

AndreyPavlenko reviewed Nov 10, 2024

View reviewed changes

lib/gc/Transforms/GPU/AllocsToSLM.cpp Outdated Show resolved Hide resolved

lib/gc/Transforms/GPU/AllocsToSLM.cpp Outdated Show resolved Hide resolved

lib/gc/Transforms/GPU/AllocsToSLM.cpp Show resolved Hide resolved

make code cleaner

deab2b8

Signed-off-by: dchigarev <[email protected]>

dchigarev requested a review from AndreyPavlenko November 10, 2024 22:04

AndreyPavlenko approved these changes Nov 10, 2024

View reviewed changes

kurapov-peter approved these changes Nov 10, 2024

View reviewed changes

dchigarev merged commit d9921c4 into intel:main Nov 11, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AllocsToSLM] Add thread-specific offsets #407

[AllocsToSLM] Add thread-specific offsets #407

Uh oh!

dchigarev commented Nov 10, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[AllocsToSLM] Add thread-specific offsets #407

[AllocsToSLM] Add thread-specific offsets #407

Uh oh!

Conversation

dchigarev commented Nov 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dchigarev commented Nov 10, 2024 •

edited

Loading