Skip to content

Commit

Permalink
add barrier to fix racing for spinning locks
Browse files Browse the repository at this point in the history
  • Loading branch information
xiaohuguo2023 committed Aug 19, 2024
1 parent 624335f commit 49b0a46
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions python/perf-kernels/streamk/streamk_kernel.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ def streamk_gemm(
rn1 = tl.max_contiguous(tl.multiple_of(rn1, BLOCK_SIZE_N), BLOCK_SIZE_N)
P_ = P + pid * BLOCK_SIZE_M * BLOCK_SIZE_N + rm1[:, None] * BLOCK_SIZE_N + rn1[None, :]
tl.store(P_, acc)
tl.debug_barrier()
tl.atomic_xchg(locks + pid, 1)

start_iter = end_iter

0 comments on commit 49b0a46

Please sign in to comment.