You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your time and for maintaining such a great project! We submitted a pull request #1166 and wanted to kindly ask if you could take some time to review it.
The changes aim to add support for different hidden dimension between qk and v. We ensure full compatibility with all existing configurations. Additionally, we have taken care to maintain the current implementation's performance, so there should be no impact on speed or efficiency.
If everything looks good, we’d greatly appreciate it if you could merge it into the main branch. Please let us know if there’s anything we need to address or adjust further.
The text was updated successfully, but these errors were encountered:
Hi @tridao ,
Thank you for your time and for maintaining such a great project! We submitted a pull request #1166 and wanted to kindly ask if you could take some time to review it.
The changes aim to add support for different hidden dimension between qk and v. We ensure full compatibility with all existing configurations. Additionally, we have taken care to maintain the current implementation's performance, so there should be no impact on speed or efficiency.
The PR aims to support faster implementation for recent paper Differential Transformer (https://arxiv.org/abs/2410.05258). We believe it would be a useful addition to the project. The full repo forked from your project is at https://github.com/xiayuqing0622/customized-flash-attention.
If everything looks good, we’d greatly appreciate it if you could merge it into the main branch. Please let us know if there’s anything we need to address or adjust further.
The text was updated successfully, but these errors were encountered: