Questions Regarding "Sink Tokens" #65

clarenceluo78 · 2023-11-09T22:34:30Z

Hi! Thank you for you interesting paper and its implementation! I have a few questions I hope you can clarify:

When employing the pre-trained model with a "sink token," is this token also prepended to the input during inference? If so, could you explain why Figure 7 presents visualizations with identical token lengths between two models? If not, is the added trainable "sink token" identitcal or functionally equivalent to each model's bos token (e.g. <s>) ensuring compatibility between inference and the training corpus?
The ablation study on the number of initial tokens suggests that incorporating just one initial token still yields reasonable results(?) for most models, except perhaps for the llama2. Considering this, if four initial tokens are optimal, have you experimented with training models using four additional "sink tokens" to align with this assumption?

Btw my own research also touches on the role of initial tokens in LLMs and I find your findings to be quite complementary to my experiment results. I would be delighted to discuss more on this if you are interested, and good luck with your iclr result :)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions Regarding "Sink Tokens" #65

Questions Regarding "Sink Tokens" #65

clarenceluo78 commented Nov 9, 2023 •

edited

Loading

Questions Regarding "Sink Tokens" #65

Questions Regarding "Sink Tokens" #65

Comments

clarenceluo78 commented Nov 9, 2023 • edited Loading

clarenceluo78 commented Nov 9, 2023 •

edited

Loading