You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUBLAS is enabled because the kernel implementation is missing.
But I'm curious, would it make sense to set -DSD_FLASH_ATTN=ON for the Mac, Linux, and other non-CUBLAS builds:
Hi there, I was reading and saw:
But I'm curious, would it make sense to set
-DSD_FLASH_ATTN=ON
for the Mac, Linux, and other non-CUBLAS builds:Thanks!
The text was updated successfully, but these errors were encountered: