Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups #1398

Open
tengdecheng opened this issue Dec 19, 2024 · 0 comments

Comments

@tengdecheng
Copy link

tengdecheng commented Dec 19, 2024

Hi, I have a question about the pipeline of Pingpong scheduling. I studied Figure 1 (Pingpong scheduling for 2 warpgroups to overlap softmax and GEMMs) and found that after warpgroup1 and warpgroup2 have completed softmax, warpgroup2 provides an arrive barrier signal to wargroup1. At this time, warpgroup1 will continue to execute GEMM1 and GEMM0.
Here I have a question about the order of the two: In Figure 1, GEMM1 (PV of the current iteration) takes precedence over the execution of QKT (GEMM0) of the next iteration.
image
But in the code, it seems to be the opposite that GEMM0 of the next iteration executed firstly and then GEMM1 of the current iteration executed, and it seems to be the following execution order:

warp_scheduler_barrier_sync();
flash::gemm</*zero_init=*/true, /*wg_wait=*/-1>(tiled_mma0, tSrQ, tSrK(_, _, _, smem_pipe_read_k.index()), tSrS);
softmax.rescale_o(tOrO, scores_scale);
consumer_wait(pipeline_v, smem_pipe_read_v);
flash::gemm</*zero_init=*/false, /*wg_wait=*/-1>(tiled_mma1, tOrP, tOrV(_, _, _, smem_pipe_read_v.index()), tOrO);
warp_scheduler_barrier_arrive();

image

I don’t find the implementation to guarantee the execution order of the two that GEMM1 of the current iteration is executed before GEMM0 of the next iteration. Could you please tell me where is it? Thank you very much.

@tengdecheng tengdecheng changed the title The execution order of GEMM0 (next iteration) and GEMM1 (current iteration) in Pingpong scheduling pipeline for overlapping gemms and softmax The execution order of GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups Dec 19, 2024
@tengdecheng tengdecheng changed the title The execution order of GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant