Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Related to the Application and Results of Attention Sinks After the Paper #66

Open
dsdanielpark opened this issue Nov 14, 2023 · 0 comments

Comments

@dsdanielpark
Copy link

Hello, I was deeply impressed by your paper.

  1. Hello, I was deeply impressed by your paper. I thought that many models would apply attention sinks since the issue with the initial token receiving a disproportionate amount of weight was resolved. However, it seems that even after some time has passed, they are not being applied as much as I expected. May I ask what the authors think might be the reason for this?

  2. I am curious whether it is better to apply attention sinks during model training or model inference, and whether there has been any performance degradation verified after the paper. In fact, I do not intuitively expect a significant improvement in speed overall, but I wonder if performance should not be slightly higher. Alternatively, I also think that intuitively, giving more weight to the early parts of a sentence might be a method to enhance the overall understanding of the sentence.

  3. Ultimately, the main point seems to be that it has addressed the issue of high initial layer weight distribution, but I'm curious why it's not universally used. I wonder if sink attention, which disperses not just the initial layers but across the whole, can maintain performance while improving speed, and how it can be best utilized.

Therefore, I am curious about how the authors' thoughts have changed after the paper.

Thank you! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant