-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare new data for NLLB-200 #24
Comments
@ibtiRaj have you solve your problem? |
@robotsp No, I didn't, I'm sorry. |
No worries. BTW, may I ask the model file and vocab file in your configs, are they the same as the original ones from NLLB? |
@robotsp Yes, you are right, the vocabulary size is 255997 but when I run the fine tuning, I get a vocabulary size mismatch error : That's why I thought of adding 200 tokens to the original vocabulary. |
Hi @ibtiRaj! The Once you re-prepare your data with the latest code and the changed config, let me know if you still face any issues. |
@kauterry Would you please have a look at facebookresearch/fairseq#4989 |
Hi @kauterry , thank you for your answer. When I prepare my data with the new version of stopes, I always get two errors:
what do you think? And what about the mismatch error, is it true that 200 new words can be added to the original vocabulary? |
I don't found nllb module in fairseq/examples of the version ==0.12.1 that recommended by the new version of Stopes (https://github.com/facebookresearch/stopes/tree/main). But when I reinstalled the nllb version of fairseq. Some conflicts of between hydra-core and fairseq occur. I think this is the root cause. Do you know why? @kauterry @ibtiRaj |
hi @robotsp, I solved the problem by following the NLLB installation guide here: https://github.com/facebookresearch/fairseq/blob/nllb/INSTALL.md. |
Hi, I'm trying to fine tune NLLB-200 model on new bilingual data. So I need to prepare my data using prepare_data pipeline: https://github.com/facebookresearch/stopes/tree/main/stopes/pipelines/prepare_data
there are my configs file:
My output directory is the following:
But I encountered a problem when fine tuning NLLb-200:
File "/home/admin/khadija/fairseq/slurm_snapshot_code/2023-02-08T14_51_26.242208/fairseq/data/dictionary.py", line 238, in add_from_file
with open(PathManager.get_local_path(f), "r", encoding="utf-8") as fd:
FileNotFoundError: [Errno 2] No such file or directory: '/home/admin/khadija/prepare_data_output/data_bin/shard000/dict.ary_Arab.txt'
srun: error: slurmnode1: tasks 0-2: Exited with exit code 1
Is Fairseq compatible with the new version of Stopes?
@Mortimerp9 @kauterry @gwenzek Can you help me please?
The text was updated successfully, but these errors were encountered: