-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenization mismatch #7
Comments
@ohhan777 Can u send me the logs after commenting it out? |
@federico1-creator This is not solved even after commenting that line. Can u look into it |
Hi everyone, thank you for your interest in our project !!! We have conduct some tests to better understand the differences in behavior between the code we're running and the tokenization mismatch issue you mentioned. To fix this issue you can use https://github.com/aimagelab/LLaVA-MORE/blob/main/scripts/more/11_pretrain_llama_31_acc_st_1.sh |
@federico1-creator Thanks for this will check it. |
If I train everything from the scratch, could I get this error too? |
Thank you for sharing the great source code. I have been trying to pretrain and fine-tune with LLaMA 3.1. While the pretraining works fine, I noticed that the following warnings occur during the fine-tuning process, preventing the model from training properly:
After checking the source code, I found that in the
train.py
file, within thepreprocess_llama_3_1()
function, thecur_len
value becomes 4 more than it should be due to the following line of code:As a result, all targets are treated as
IGNORE_INDEX
, and the model does not train. When I commented out this line, the issue seemed to disappear, and the training worked properly. Was this line intentionally included?The text was updated successfully, but these errors were encountered: