Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Does FA3 Use Registers Instead of Directly Accessing SMEM with WGMMA on SM90? #1407

Open
ziyuhuang123 opened this issue Dec 23, 2024 · 0 comments

Comments

@ziyuhuang123
Copy link

I learned that SM90's WGMMA can fetch operands directly from SMEM instead of requiring them to be loaded into registers. However, I noticed that FA3 still fetches operands from registers. What is the reason for this?

For example:

flash::gemm</*zero_init=*/true, /*wg_wait=*/-1>(tiled_mma0, tSrQ, tSrK(_, _, _, smem_pipe_read_k.index()), tSrS);

https://github.com/Dao-AILab/flash-attention/blob/0dfb28174333d9eefb7c1dd4292690a8458d1e89/hopper/mainloop_fwd_sm90_tma_gmma_ws.hpp#L724C9-L724C122

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant