-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reproduce the bleu score in 2 GPU cards? #4
Comments
Please take a look at #3 (comment) In your case, set |
I try to set --update-freq 3 , but the training loss still decrease slowly . 2GPU / freq=3epoch | train_loss | valid _loss
Should I try to increase the max-token size? or this result just because it's difference between 2 and 6 GPU? Thank you . |
The update frequency in the log still looks weird to me, you can check out my log file here #3 (comment). For example, after the first epoch, your learning rate is much lower than mine. |
Hi, someone can please guide me how to start the training correctly. I have created the dataset the same way. I have around 24millions sentences, but when I start training but the loss not decreasing and accuracy is only in points like 0.85 bleu score after 5 epochs and 3 days of training on p100 single gpu. I am using this command CUDA_VISIBLE_DEVICES=0 fairseq-train |
My env :
2 NVIDA GeForce RTX 2080 Ti
pytorch 1.5.0
Data source : http://www.statmt.org/wmt17/translation-task.html
include "News Commentary v12" and "UN Parallel Corpus V1.0"
Data preprocess follow prepare.sh
Train :
CUDA_VISIBLE_DEVICES=0,1 fairseq-train data-bin/wmt17_en_zh -a transformer --optimizer adam -s en -t zh --label-smoothing 0.1 --dropout 0.3 --max-tokens 4000 --min-lr '1e-09' --lr-scheduler inverse_sqrt --weight-decay 0.0001 --criterion label_smoothed_cross_entropy --max-update 1000000 --warmup-updates 10000 --warmup-init-lr '1e-7' --lr '0.001' --adam-betas '(0.9, 0.98)' --adam-eps '1e-09' --clip-norm 25.0 --keep-last-epochs 10 --save-dir checkpoints_test |& tee -a wmt17_train.test.log
Then I got very bad score ...
2020-07-28 11:22:01 | INFO | fairseq_cli.generate | Generate test with beam=5: BLEU4 = 0.00, 6.4/0.0/0.0/0.0 (BP=0.444, ratio=0.552, syslen=26013, reflen=47155)
Training log is here !
https://drive.google.com/file/d/11l5c8VFH1nmZxjbVhD15U3PbWHBFkCtd/view?usp=sharing
Can you give me some suggestion about this result ?
Thank you !
The text was updated successfully, but these errors were encountered: