Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce XTREME numbers #1327

Open
dapurv5 opened this issue Jul 12, 2021 · 2 comments
Open

Unable to reproduce XTREME numbers #1327

dapurv5 opened this issue Jul 12, 2021 · 2 comments

Comments

@dapurv5
Copy link

dapurv5 commented Jul 12, 2021

Unable to Reproduce the XTREME numbers for xlm-roberta-large

We are unable to reproduce the xtreme benchmark numbers as reported in the original paper. I provide an example of PAWSX and XNLI here.

To Reproduce

  1. Branch: mainline
  2. Environment: 1 p4.8xlarge
  3. Hyperparams for

XNLI

--model_type $MODEL_TYPE \
--model_name_or_path $MODEL \
--train_language en \
--task_name xnli \
--do_train \
--do_eval \
--do_predict \
--gradient_accumulation_steps 4 \
--per_gpu_train_batch_size 64 \
--learning_rate 2e-5 \
--num_train_epochs 2 \
--max_seq_length 128 \
--output_dir $SAVE_DIR/ \
--save_steps 500 \
--logging_steps 500 \
--eval_all_checkpoints \
--log_file 'train' \
--predict_languages "ar,bg,de,el,en,es,fr,hi,ru,sw,th,tr,ur,vi,zh" \
--save_only_best_checkpoint \
--overwrite_output_dir 

PAWSX

{
  "jiant_task_container_config_path": "/home/ec2-user/jiant/xtreme-exp/runconfigs/pawsx.json",
  "output_dir": "/home/ec2-user/jiant/xtreme-exp/runs/pawsx",
  "hf_pretrained_model_name_or_path": "xlm-roberta-large",
  "model_path": "/home/ec2-user/jiant/xtreme-exp/models/xlm-roberta-large/model/model.p",
  "model_config_path": "/home/ec2-user/jiant/xtreme-exp/models/xlm-roberta-large/model/config.json",
  "model_load_mode": "from_transformers",
  "do_train": true,
  "do_val": true,
  "do_save": true,
  "do_save_last": false,
  "do_save_best": false,
  "write_val_preds": false,
  "write_test_preds": true,
  "eval_every_steps": 1000,
  "save_every_steps": 0,
  "save_checkpoint_every_steps": 0,
  "no_improvements_for_n_evals": 5,
  "keep_checkpoint_when_done": false,
  "force_overwrite": true,
  "seed": 1146493838,
  "learning_rate": 3e-05,
  "adam_epsilon": 1e-08,
  "max_grad_norm": 1.0,
  "optimizer_type": "adam",
  "no_cuda": false,
  "fp16": false,
  "fp16_opt_level": "O1",
  "local_rank": -1,
  "server_ip": "",
  "server_port": ""
}

Results

"pawsx": {
"accuracy": {"de": 55.25, 
             "en": 54.65, 
             "es": 54.65, 
             "fr": 54.85, 
             "ja": 55.85, 
             "ko": 55.15,
             "zh": 55.300000000000004}, 
"avg_accuracy": 55.1, 
"avg_metric": 55.1},

This number is too low. We were expecting this number to be around ~80%.
image

Similarly, for XNLI the numbers we are getting are far lesser than those reported on the paper.

Is there something we are missing?

@sleepinyourhat
Copy link
Contributor

@zphang, mind taking a look?

@zphang
Copy link
Collaborator

zphang commented Jul 26, 2021

Hi,

I believe the issue may have been that the XLM-R weights not being correctly loaded because of a recent update. I've made a PR that should address the issue (#1329). Could you retry and let me know if it works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants