-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flash attention 3 does not use Dropout_p? #1377
Comments
@tridao could you please confirm why dropout_p is not included in FA3? |
doc in the README says dropout can be passed?? |
dropout is not supported in FA3 |
@tridao i tried swapping a model which had FA2 (SDPA) with FA3, and i see numerical mismatch in the results |
do you have any tests that confirm numerical equivalency |
We have tests. If there's a mismatch, please help us by providing a short script to reproduce the error. |
hi i was trying to train a model by swapping out FA2 (SDPA) with FA3, however it does not use dropout_p?
reference:
flash-attention/hopper/flash_attn_interface.py
Line 350 in f86e3dd
also the speedup i get in forward is only aroudn 10-20%
The text was updated successfully, but these errors were encountered: