Skip to content

Latest commit

 

History

History
92 lines (68 loc) · 3.68 KB

README.md

File metadata and controls

92 lines (68 loc) · 3.68 KB

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

https://arxiv.org/pdf/1910.13461.pdf

Introduction

BART is sequence-to-sequence model trained with denoising as pretraining objective. We show that this pretraining objective is more generic and show that we can match RoBERTa results on SQuAD and GLUE and gain state-of-the-art results on summarization (XSum, CNN dataset), long form generative question answering (ELI5) and dialog response genration (ConvAI2). See the associated paper for more details.

Speedup BART (Fairseq version) by using FastSeq

  • CNN daily mail validation data, NVIDIA-V100-16GB

    BatchSize 32 64 128 320
    fairseq-0.10.2 3.3 samples/s OOM OOM OOM
    above + fastseq 10.7 samples/s 17.1 samples/s 21.8 samples/s 25.1 samples/s

Model

Model Description # params Download
bart.base BART model with 6 encoder and decoder layers 140M bart.base.tar.gz
bart.large BART model with 12 encoder and decoder layers 400M bart.large.tar.gz
bart.large.mnli bart.large finetuned on MNLI 400M bart.large.mnli.tar.gz
bart.large.cnn bart.large finetuned on CNN-DM 400M bart.large.cnn.tar.gz
bart.large.xsum bart.large finetuned on Xsum 400M bart.large.xsum.tar.gz

bart.large.cnn is used in speed benchmark.

Task

CNN/DM validation data

Setting

$ fastseq-generate-for-fairseq \
      cnn_dm/len-1024.bin \
      --path bart.large.cnn/model.pt \
      --fp16 \
      --task translation \
      --batch-size BATCH_SIZE \
      --gen-subset valid \
      --truncate-source  \
      --bpe gpt2 \
      --beam 4 \
      --num-workers 4 \
      --min-len 55 \
      --max-len-b 140 \
      --no-repeat-ngram-size 3 \
      --lenpen 2.0

To get the baseline fairseq's speed number, replace fastseq-generate-for-fairseq by fairseq-generate.

Code Example

Refer to file.

Speedup BART (Huggingface Transformers version) by using FastSeq

  • CNN daily mail validation data, NVIDIA-V100-16GB

    BatchSize 32 64 128
    transformers-4.12.0 4.5 samples/s 4.5 samples/s OOM
    above + fastseq 10.6 samples/s 11.7 samples/s 12.4 samples/s

Model

facebook/bart-large-cnn from model hub.

Task

CNN/DM validation data

Setting

$ fastseq-generate-for-transformers \
    facebook/bart-large-cnn \
    cnn_dm/val.source \
    out.summary \
    --reference_path cnn_dm.1k/val.target \
    --device cuda \
    --bs BATCH_SIZE \
    --fp16 \
    --score_path out.score \
    --task summarization

Baseline speed number is obtained by running Transformers v4.12.0 code.

Code Example

Refer to file.