BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

https://arxiv.org/pdf/1910.13461.pdf

Introduction

BART is sequence-to-sequence model trained with denoising as pretraining objective. We show that this pretraining objective is more generic and show that we can match RoBERTa results on SQuAD and GLUE and gain state-of-the-art results on summarization (XSum, CNN dataset), long form generative question answering (ELI5) and dialog response genration (ConvAI2). See the associated paper for more details.

Speedup BART (Fairseq version) by using FastSeq

CNN daily mail validation data, NVIDIA-V100-16GB

BatchSize 32 64 128 320

fairseq-0.10.2 3.3 samples/s OOM OOM OOM

above + fastseq 10.7 samples/s 17.1 samples/s 21.8 samples/s 25.1 samples/s

Model

Model	Description	# params	Download
`bart.base`	BART model with 6 encoder and decoder layers	140M	bart.base.tar.gz
`bart.large`	BART model with 12 encoder and decoder layers	400M	bart.large.tar.gz
`bart.large.mnli`	`bart.large` finetuned on `MNLI`	400M	bart.large.mnli.tar.gz
`bart.large.cnn`	`bart.large` finetuned on `CNN-DM`	400M	bart.large.cnn.tar.gz
`bart.large.xsum`	`bart.large` finetuned on `Xsum`	400M	bart.large.xsum.tar.gz

bart.large.cnn is used in speed benchmark.

Task

CNN/DM validation data

Setting

$ fastseq-generate-for-fairseq \
      cnn_dm/len-1024.bin \
      --path bart.large.cnn/model.pt \
      --fp16 \
      --task translation \
      --batch-size BATCH_SIZE \
      --gen-subset valid \
      --truncate-source  \
      --bpe gpt2 \
      --beam 4 \
      --num-workers 4 \
      --min-len 55 \
      --max-len-b 140 \
      --no-repeat-ngram-size 3 \
      --lenpen 2.0

To get the baseline fairseq's speed number, replace fastseq-generate-for-fairseq by fairseq-generate.

Code Example

Refer to file.

Speedup BART (Huggingface Transformers version) by using FastSeq

CNN daily mail validation data, NVIDIA-V100-16GB

BatchSize 32 64 128

transformers-4.12.0 4.5 samples/s 4.5 samples/s OOM

above + fastseq 10.6 samples/s 11.7 samples/s 12.4 samples/s

Model

facebook/bart-large-cnn from model hub.

Task

CNN/DM validation data

Setting

$ fastseq-generate-for-transformers \
    facebook/bart-large-cnn \
    cnn_dm/val.source \
    out.summary \
    --reference_path cnn_dm.1k/val.target \
    --device cuda \
    --bs BATCH_SIZE \
    --fp16 \
    --score_path out.score \
    --task summarization

Baseline speed number is obtained by running Transformers v4.12.0 code.

Code Example

Refer to file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!