Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Label delimitation problem fix with Qwen:
Mistral/llama/solar/gemma tokenizers all have:
user_message + generatino_prompt + answer != chat_template(user_message, answer)
:As they always add an extra newline in the second case. Qwen doesn't do that, so our way of using the length of "user_message + generatino_prompt" (:= position) to delimitate the label for training does not work.
With this fix:
=> Behaviour is unchanged for all models but Qwen
=> Alexandre ran multiple experiments with Qwen and that code: all works well, no warning.
Qwen pad_token handling
Qwen has no bos_token, which we used as padding in bergen: we take pad_token if it exists (it does for Qwen), and eos_token in last resort.