-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running flash_attn/flash_attn_triton_amd/bench.py with sequence length > 4096 causes RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered #1440
Comments
What is the version of flash-attn you are using? |
@yingyukexiansheng The version of flash-attn is 2.7.2.post1 |
i have the same error, my version of flash-attn is 2.6.3 |
I later tried the 2.4.3.post1 version and my problem was solved |
I tried 2.4.3.post1 version but I still failed to run the script. Could you show your virtual environment and the scripts you’re running? |
Hi,
I tried to run the
./flash_attn/flash_attn_triton_amd/bench.py
and I encountered an issue while benchmarking with FlashAttention-2 on an triton setup. When both the sequence input lengths (-sq and -sk) are greater than 4096, the following error occurs during the backward pass:Here is my script.
It appears that when both -sq and -sk exceed 4096, the backward kernel fails with an illegal memory access. However, when the sequence lengths are less than or equal to 4096, the benchmarking completes without any issues.
The Triton version is 3.1.0, and the GPU is H20.
Does anyone have any idea?
The text was updated successfully, but these errors were encountered: