You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the GEMM here uses wgmma.async, the subsequent barrier.arrive cannot guarantee that GEMM execution is complete, right? It can only ensure that the GEMM has been issued, correct? Otherwise, if strictly following the order of GEMM0, GEMM1, and SOFTMAX, it would be impossible to achieve the overlap within the warp group as shown in the figure below:
The text was updated successfully, but these errors were encountered:
Considering that GEMM is asynchronous (assuming) and softmax is synchronous (execution must complete before proceeding), I tried to combine the intra and inter illustrations to roughly draw something like this...
Since the GEMM here uses wgmma.async, the subsequent barrier.arrive cannot guarantee that GEMM execution is complete, right? It can only ensure that the GEMM has been issued, correct? Otherwise, if strictly following the order of GEMM0, GEMM1, and SOFTMAX, it would be impossible to achieve the overlap within the warp group as shown in the figure below:
The text was updated successfully, but these errors were encountered: