Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additive Bias in Flash Attention #1219

Open
kkh517 opened this issue Sep 11, 2024 · 0 comments
Open

Additive Bias in Flash Attention #1219

kkh517 opened this issue Sep 11, 2024 · 0 comments

Comments

@kkh517
Copy link

kkh517 commented Sep 11, 2024

Hello @tridao !

I'm trying to apply flash attention to an algorithm that involves additive bias. From what I understand, flash attention only supports additive bias (or Additive Linear Bias, ALiBi) in the triton version, and even there, gradients are not applied to the bias. In my current use case, I need gradients to be applied to the bias so that the related parameters are updated during backpropagation. I have a few questions:

Does flash attention support additive bias?

If not, what challenges in CUDA programming prevent its implementation?

Why is the gradient not applied to the bias in the triton version?

I'm not very experienced with CUDA, so this might not make perfect sense, but would it be possible to simply disable the bias feature, transfer it from HBM to SRAM, and combine operations like softmax, addition, and matrix multiplication (i.e., sm(qk^T+b)v) into a single kernel?

I believe this has been raised in previous discussions, but it doesn't seem fully resolved. I'd appreciate any guidance.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant