Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to finetune NLLB but got an error: "Can't instantiate abstract class TrainModule with abstract methods requirements" #4989

Open
robotsp opened this issue Feb 21, 2023 · 9 comments

Comments

@robotsp
Copy link

robotsp commented Feb 21, 2023

🐛 Bug

Try to finetune NLLB but got an error

Can't instantiate abstract class TrainModule with abstract methods requirements

CMD

python /fairseq-nllb/examples/nllb/modeling/train/train_script.py \
    cfg=bilingual \
    cfg/dataset=$DATA_CONFIG \
    cfg.dataset.lang_pairs="$SRC-$TGT" \
    cfg.fairseq_root=$FAIRSEQ_ROOT \
    cfg.output_dir=$OUTPUT_DIR \
    cfg.dropout=$DROP \
    cfg.warmup=10 \
    cfg.finetune_from_model=$MODEL_FOLDER/checkpoint.pt

Complete Error

The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="conf", config_name="base_config")
/usr/local/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Error executing job with overrides: ['cfg=bilingual', 'cfg/dataset=fbseed_bilingual.yaml', 'cfg.fairseq_root=/fairseq-nllb', 'cfg.output_dir=/output_nllb', 'cfg.dropout=0.1', 'cfg.warmup=10', 'cfg.finetune_from_model=/output_nllb/nllb_model/checkpoint.pt']
Traceback (most recent call last):
  File "/fairseq-nllb/examples/nllb/modeling/train/train_script.py", line 289, in main
    train_module = TrainModule(config)
TypeError: Can't instantiate abstract class TrainModule with abstract methods requirements

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment

  • fairseq Version (e.g., 1.0 or main): nllb latest
  • PyTorch Version (e.g., 1.0): 1.8.1+cu101
  • OS (e.g., Linux): Ubuntu 20.04
  • How you installed fairseq (pip, source): pip+source
  • Build command you used (if compiling from source): pip install --editable .
  • Python version: 3.8
  • CUDA/cuDNN version: 10.1
  • GPU models and configuration: Tesla V100
  • Any other relevant information:

Additional context

@ibtiRaj
Copy link

ibtiRaj commented Feb 21, 2023

@robotsp hey, I got this error when I reinstalled Stopes with the new version. I think Fairseq is not compatible with the new version of Stopes. I solved this problem by using the old version.

@robotsp robotsp closed this as completed Feb 22, 2023
@robotsp
Copy link
Author

robotsp commented Feb 22, 2023

Thanks @ibtiRaj

@robotsp robotsp reopened this Feb 22, 2023
@robotsp
Copy link
Author

robotsp commented Feb 22, 2023

@robotsp hey, I got this error when I reinstalled Stopes with the new version. I think Fairseq is not compatible with the new version of Stopes. I solved this problem by using the old version.

@ibtiRaj But it seems a new error when you downgraded stopes to the old version, right? And @kauterry mentioned you (facebookresearch/stopes#24) to install the new version of stopes to solve the error. I confused if there is an end to end solution :)

Best,

@ibtiRaj
Copy link

ibtiRaj commented Feb 22, 2023

@robotsp I'm confused too, I don't know what to do.

@ibtiRaj
Copy link

ibtiRaj commented Feb 22, 2023

@robotsp Can you tell me what your system configuration is, i.e. number of GPUs, GPU memory and system memory (RAM)?

@robotsp
Copy link
Author

robotsp commented Feb 24, 2023

@robotsp Can you tell me what your system configuration is, i.e. number of GPUs, GPU memory and system memory (RAM)?

8 GPUs, 48 CPUs, 480GB Mem @ibtiRaj .
btw, would you please provide your running scripts and config files that altered? Thanks!

@ibtiRaj
Copy link

ibtiRaj commented Feb 28, 2023

hi @robotsp, to fine tune NLLB model I use this command:

srun python /home/admin/khadija/fairseq/examples/nllb/modeling/train/train_script.py cfg=nllb200_dense3.3B_finetune_on_fbseed cfg/dataset=bilingual cfg.dataset.lang_pairs=ary_Arab-eng_Latn cfg.fairseq_root=/home/admin/khadija/fairseq cfg.output_dir=/home/admin/khadija/storagenas/fine_tune_nllb_output/model_fine_tuned cfg.dropout=0.1 cfg.warmup=10 cfg.finetune_from_model=/home/admin/khadija/storagenas/projects/NLLB_modeles/checkpoint.pt

and here are my configuration files:

image

image

Is that what you meant?

@The-Next
Copy link

The-Next commented Mar 3, 2023

Hello, I have the same problem as you.
I found that the problem might be on the stopes.
Like error report, abstract method requirements are not implemented in TrainModule.
Perhaps because the nllb and stopes versions do not correspond, there may be no requirements in the previous versions, so I deleted the requirements method in stopes.stopes.core.stopes_module. The program can run normally. I hope it will help you.

@martinbombin
Copy link

martinbombin commented Mar 7, 2023

Hello, I have implemented the abstract method by my own in fairseq/examples/nllb/modeling/train/train_script.py (pretty simple).

Captura de pantalla de 2023-03-07 11-34-23

It worked for me. However, when I try to load the model, I get errors. I am also trying to fine tune it, it seems that it is trying to initialise the model with my vocabulary instead of doing it with the vocabulary it has been trained on. Anyone have a solution to this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants