You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups
#1398
Open
tengdecheng opened this issue
Dec 19, 2024
· 0 comments
Hi, I have a question about the pipeline of Pingpong scheduling. I studied Figure 1 (Pingpong scheduling for 2 warpgroups to overlap softmax and GEMMs) and found that after warpgroup1 and warpgroup2 have completed softmax, warpgroup2 provides an arrive barrier signal to wargroup1. At this time, warpgroup1 will continue to execute GEMM1 and GEMM0.
Here I have a question about the order of the two: In Figure 1, GEMM1 (PV of the current iteration) takes precedence over the execution of QKT (GEMM0) of the next iteration.
But in the code, it seems to be the opposite that GEMM0 of the next iteration executed firstly and then GEMM1 of the current iteration executed, and it seems to be the following execution order:
I don’t find the implementation to guarantee the execution order of the two that GEMM1 of the current iteration is executed before GEMM0 of the next iteration. Could you please tell me where is it? Thank you very much.
The text was updated successfully, but these errors were encountered:
tengdecheng
changed the title
The execution order of GEMM0 (next iteration) and GEMM1 (current iteration) in Pingpong scheduling pipeline for overlapping gemms and softmax
The execution order of GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups
Dec 19, 2024
tengdecheng
changed the title
The execution order of GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups
The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups
Dec 19, 2024
Hi, I have a question about the pipeline of Pingpong scheduling. I studied Figure 1 (Pingpong scheduling for 2 warpgroups to overlap softmax and GEMMs) and found that after warpgroup1 and warpgroup2 have completed softmax, warpgroup2 provides an arrive barrier signal to wargroup1. At this time, warpgroup1 will continue to execute GEMM1 and GEMM0.
Here I have a question about the order of the two: In Figure 1, GEMM1 (PV of the current iteration) takes precedence over the execution of QKT (GEMM0) of the next iteration.
But in the code, it seems to be the opposite that GEMM0 of the next iteration executed firstly and then GEMM1 of the current iteration executed, and it seems to be the following execution order:
I don’t find the implementation to guarantee the execution order of the two that GEMM1 of the current iteration is executed before GEMM0 of the next iteration. Could you please tell me where is it? Thank you very much.
The text was updated successfully, but these errors were encountered: