Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for qk dim different from v dim in PR #1166 #1358

Closed
YTianZHU opened this issue Nov 27, 2024 · 0 comments
Closed

Add support for qk dim different from v dim in PR #1166 #1358

YTianZHU opened this issue Nov 27, 2024 · 0 comments

Comments

@YTianZHU
Copy link

YTianZHU commented Nov 27, 2024

Hi @tridao ,

Thank you for your time and for maintaining such a great project! We submitted a pull request #1166 and wanted to kindly ask if you could take some time to review it.

The changes aim to add support for different hidden dimension between qk and v. We ensure full compatibility with all existing configurations. Additionally, we have taken care to maintain the current implementation's performance, so there should be no impact on speed or efficiency.

The PR aims to support faster implementation for recent paper Differential Transformer (https://arxiv.org/abs/2410.05258). We believe it would be a useful addition to the project. The full repo forked from your project is at https://github.com/xiayuqing0622/customized-flash-attention.

If everything looks good, we’d greatly appreciate it if you could merge it into the main branch. Please let us know if there’s anything we need to address or adjust further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant