Add support for qk dim different from v dim in PR #1166 #1358

YTianZHU · 2024-11-27T06:33:47Z

Thank you for your time and for maintaining such a great project! We submitted a pull request #1166 and wanted to kindly ask if you could take some time to review it.

The changes aim to add support for different hidden dimension between qk and v. We ensure full compatibility with all existing configurations. Additionally, we have taken care to maintain the current implementation's performance, so there should be no impact on speed or efficiency.

The PR aims to support faster implementation for recent paper Differential Transformer (https://arxiv.org/abs/2410.05258). We believe it would be a useful addition to the project. The full repo forked from your project is at https://github.com/xiayuqing0622/customized-flash-attention.

If everything looks good, we’d greatly appreciate it if you could merge it into the main branch. Please let us know if there’s anything we need to address or adjust further.

xiayuqing0622 mentioned this issue Nov 28, 2024

Please Publicize xiayuqing0622/flex_head_fa#6

Closed

YTianZHU closed this as completed Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for qk dim different from v dim in PR #1166 #1358

Add support for qk dim different from v dim in PR #1166 #1358

YTianZHU commented Nov 27, 2024 •

edited

Loading

Add support for qk dim different from v dim in PR #1166 #1358

Add support for qk dim different from v dim in PR #1166 #1358

Comments

YTianZHU commented Nov 27, 2024 • edited Loading

YTianZHU commented Nov 27, 2024 •

edited

Loading