Add support for qk hidden dim different from v hidden dim #1166

smallscientist1 · 2024-08-20T07:43:48Z

We add support for

different hidden dimension between qk and v.
not equal num_heads_k and num_heads_v, such as (num_heads_q, num_heads_k, num_heads_v) = (32, 4, 16).

For different hidden dimension between qk and v, we have supported:

FlashAttention-2 with QKHeadDim=32, VHeadDim=64
FlashAttention-2 with QKHeadDim=64, VHeadDim=128
FlashAttention-2 with QKHeadDim=96, VHeadDim=192
FlashAttention-2 with QKHeadDim=128, VHeadDim=256
FlashAttention-2 with QKHeadDim=192, VHeadDim=128

For headdim not supported, you can use the autotuner to generate the implementation. Details are in autotuner.md.

Performance

We test the performance speedup compare to padding qk&v hidden_dim to the same length.

Test

We add unittest in tests/test_flash_attn_headdim.py tests/test_flash_attn_head.py.

iqiancheng · 2024-08-21T08:01:04Z

hi~ @smallscientist1
Regarding the combinations of qk and v dimensions you've implemented in FlashAttention-2, which configuration have you found to offer the best balance between performance and model effectiveness? Specifically, among the combinations:

QKHeadDim=32, VHeadDim=64
QKHeadDim=64, VHeadDim=128
QKHeadDim=96, VHeadDim=192
QKHeadDim=128, VHeadDim=256

Which one stands out in terms of computational efficiency and model quality?

xiayuqing0622 · 2024-08-22T03:32:37Z

hi~ @smallscientist1 Regarding the combinations of qk and v dimensions you've implemented in FlashAttention-2, which configuration have you found to offer the best balance between performance and model effectiveness? Specifically, among the combinations:

QKHeadDim=32, VHeadDim=64 QKHeadDim=64, VHeadDim=128 QKHeadDim=96, VHeadDim=192 QKHeadDim=128, VHeadDim=256

Which one stands out in terms of computational efficiency and model quality?

In terms of model quality, it's too early to make a definitive assessment since the work is still in progress. However, several teams we've collaborated with expressed a need for this combination, so we implemented it. Additionally, anticipating that others might find it useful, we created this PR to benefit the broader community.

* create bench headdim * update bench result * update Readme * reorg code to reduce compile time * update (128,256) config * add (192,128) * add config (192,128) * fix bug * fix bug backward * fix bug

Support different num_head of k and v

Dim autotuner

merge to dim_pr

ehartford · 2024-11-28T02:51:55Z

@tridao can you please check on this issue? I believe that this is a very significant contribution to Flash Attention.

tridao · 2024-11-28T03:21:29Z

Thanks for this contribution. This is very impressive!
However I think having different qk headdim and v headdim complicates the code and increases the maintenance workload. I believe it's better to have this in a separate fork, until there's significant adoption of this attention variant.

xiayuqing0622 · 2024-11-28T06:30:41Z

Thanks for this contribution. This is very impressive! However I think having different qk headdim and v headdim complicates the code and increases the maintenance workload. I believe it's better to have this in a separate fork, until there's significant adoption of this attention variant.

Thank you for your feedback! We’ll keep it in a separate fork for now.

xiayuqing0622 and others added 15 commits August 13, 2024 01:37

intermediate save

5eeef1e

support var dim

331a601

modify readme

02da101

compatible

2bce87c

Merge branch 'main' into dim

51a8bcb

test_head_dim

e8b4082

add test headdim

ebf0b16

fix some config bug

ab35fc2

update test headdim

4e94c20

Merge branch 'Dao-AILab:main' into dim

e31b6a4

update test headdim splitkv

89dbe52

Merge commit '89dbe521b48000ee4f3d942d7c3498c698817159' into dim

fc094a4

update ReadMe.md

d11b7ae

remove unused file

21ca4bc

revert Readme

4c3462a

smallscientist1 force-pushed the dim_pr branch from c5ec69d to 4c3462a Compare August 20, 2024 08:45

create bench headdim

f63411d

smallscientist1 and others added 2 commits August 22, 2024 09:11

update bench result

3e0c7c4

update Readme

3caa059

smallscientist1 and others added 9 commits August 22, 2024 08:32

reorg code to reduce compile time

493a430

update (128,256) config

0607e6c

add (192,128)

fd6fc29

add config (192,128)

b6d7493

fix bug

85fb8d2

fix bug backward

f0644c2

fix bug

0092285

Add support for dim(192,128) (#1)

6e88a4d

* create bench headdim * update bench result * update Readme * reorg code to reduce compile time * update (128,256) config * add (192,128) * add config (192,128) * fix bug * fix bug backward * fix bug

add optional dim compile

255cd5a

smallscientist1 and others added 24 commits September 4, 2024 14:56

update flash api head

18b309d

fix interface bug

6909ab4

Merge pull request #2 from xiayuqing0622/head

3c8bb2b

Support different num_head of k and v

update README

5f26eb0

benchmark head_headdim

536a8cc

fix bench bug

ca6335d

fix bug for numhead

def41c0

add autotuner

6e8d537

basetuner fwd

83fd7a5

update autotuner FLashFwd

7cf4858

autotuner fwd

1ca8397

update code

1e5c49d

update autotuner log

409bdde

update tunner

d4b620a

fix bug kernel launch

be21a0a

update autotuner tile space

90fa651

update cutlass bugfix

1ba39eb

add autotuner doc

c5fa3c9

Merge pull request #3 from xiayuqing0622/dim_autotuner

31ea0bb

Dim autotuner

update readme

b09eaee

update autotuner

cd9fee4

update readme

014c349

Merge branch 'dim_pr' into dim_pr1

cd91625

Merge pull request #4 from xiayuqing0622/dim_pr1

d578cff

merge to dim_pr

smallscientist1 marked this pull request as ready for review September 19, 2024 08:37

YTianZHU mentioned this pull request Nov 27, 2024

Add support for qk dim different from v dim in PR #1166 #1358

Closed

ehartford mentioned this pull request Nov 27, 2024

Please Publicize xiayuqing0622/flex_head_fa#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for qk hidden dim different from v hidden dim #1166

Add support for qk hidden dim different from v hidden dim #1166

smallscientist1 commented Aug 20, 2024 •

edited

Loading

iqiancheng commented Aug 21, 2024 •

edited

Loading

xiayuqing0622 commented Aug 22, 2024

ehartford commented Nov 28, 2024

tridao commented Nov 28, 2024

xiayuqing0622 commented Nov 28, 2024

Add support for qk hidden dim different from v hidden dim #1166

Are you sure you want to change the base?

Add support for qk hidden dim different from v hidden dim #1166

Conversation

smallscientist1 commented Aug 20, 2024 • edited Loading

Performance

Test

iqiancheng commented Aug 21, 2024 • edited Loading

xiayuqing0622 commented Aug 22, 2024

ehartford commented Nov 28, 2024

tridao commented Nov 28, 2024

xiayuqing0622 commented Nov 28, 2024

smallscientist1 commented Aug 20, 2024 •

edited

Loading

iqiancheng commented Aug 21, 2024 •

edited

Loading