Skip to content

Commit

Permalink
add barrier to fix racing for spinning locks (#632)
Browse files Browse the repository at this point in the history
  • Loading branch information
xiaohuguo2023 authored Aug 19, 2024
1 parent 15cb3a8 commit 177d0bd
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions python/perf-kernels/streamk/streamk_kernel.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ def streamk_gemm(
rn1 = tl.max_contiguous(tl.multiple_of(rn1, BLOCK_SIZE_N), BLOCK_SIZE_N)
P_ = P + pid * BLOCK_SIZE_M * BLOCK_SIZE_N + rm1[:, None] * BLOCK_SIZE_N + rn1[None, :]
tl.store(P_, acc)
tl.debug_barrier()
tl.atomic_xchg(locks + pid, 1)

start_iter = end_iter

0 comments on commit 177d0bd

Please sign in to comment.