We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flash_attn_varlen_func
Flash attention allows one to pass cu_seqlens_q and cu_seqlens_k in order to pack multiple short sequences into a single batch example.
cu_seqlens_q
cu_seqlens_k
This helps avoid wasting any compute/memory on handling paddings.
I was wondering if there're any plans to support a similar behavior in ThunderKittens. Thanks!
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Flash attention allows one to pass
cu_seqlens_q
andcu_seqlens_k
in order to pack multiple short sequences into a single batch example.This helps avoid wasting any compute/memory on handling paddings.
I was wondering if there're any plans to support a similar behavior in ThunderKittens.
Thanks!
The text was updated successfully, but these errors were encountered: