Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AllocsToSLM] Add thread-specific offsets #407

Merged
merged 2 commits into from
Nov 11, 2024
Merged

Conversation

dchigarev
Copy link
Contributor

@dchigarev dchigarev commented Nov 10, 2024

The proper way to use SLM via memref.alloc() appears to be allocating the SLM chunk for all threads in the workgroup at once, then slicing it for each thread. This PR implements this logic.

Example:

// Before the pass
func.func @entry() {
  gpu.launch blocks(%bx, %by, %bz) in (%sz_bx = %c8, %sz_by = %c8, %sz_bz = %c1)
             threads(%tx, %ty, %tz) in (%sz_tx = %c2, %sz_ty = %c4, %sz_tz = %c1) {
    %slm = memref.alloc() : memref<16x32xf16>
    gpu.terminator
  }
  return
}

// After the pass
func.func @entry() {
  gpu.launch blocks(%bx, %by, %bz) in (%sz_bx = %c8, %sz_by = %c8, %sz_bz = %c1)
             threads(%tx, %ty, %tz) in (%sz_tx = %c2, %sz_ty = %c4, %sz_tz = %c1) {
    // Scale allocation size by the number of threads in the work-group (16 * 2 = 32; 32 * 4 = 128)
    // This 'alloc' is only called once per work-group
    %slm_root = memref.alloc() : memref<32x128xf16, 3>
    // Compute the subview for each thread
    %slm = memref.subview %slm_root[%tx * 16, %ty * 32] [16, 32] : memref<32x128xf16, 3> to memref<16x32xf16, 3>
    gpu.terminator
  }
  return
}

@dchigarev dchigarev marked this pull request as ready for review November 10, 2024 20:23
lib/gc/Transforms/GPU/AllocsToSLM.cpp Outdated Show resolved Hide resolved
lib/gc/Transforms/GPU/AllocsToSLM.cpp Outdated Show resolved Hide resolved
lib/gc/Transforms/GPU/AllocsToSLM.cpp Show resolved Hide resolved
Signed-off-by: dchigarev <[email protected]>
@dchigarev dchigarev merged commit d9921c4 into intel:main Nov 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants