BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
https://arxiv.org/pdf/1910.13461.pdf
BART is sequence-to-sequence model trained with denoising as pretraining objective. We show that this pretraining objective is more generic and show that we can match RoBERTa results on SQuAD and GLUE and gain state-of-the-art results on summarization (XSum, CNN dataset), long form generative question answering (ELI5) and dialog response genration (ConvAI2). See the associated paper for more details.
-
CNN daily mail validation data, NVIDIA-V100-16GB
BatchSize 32 64 128 320 fairseq-0.10.2 3.3 samples/s OOM OOM OOM above + fastseq 10.7 samples/s 17.1 samples/s 21.8 samples/s 25.1 samples/s
Model | Description | # params | Download |
---|---|---|---|
bart.base |
BART model with 6 encoder and decoder layers | 140M | bart.base.tar.gz |
bart.large |
BART model with 12 encoder and decoder layers | 400M | bart.large.tar.gz |
bart.large.mnli |
bart.large finetuned on MNLI |
400M | bart.large.mnli.tar.gz |
bart.large.cnn |
bart.large finetuned on CNN-DM |
400M | bart.large.cnn.tar.gz |
bart.large.xsum |
bart.large finetuned on Xsum |
400M | bart.large.xsum.tar.gz |
bart.large.cnn
is used in speed benchmark.
CNN/DM validation data
$ fastseq-generate-for-fairseq \
cnn_dm/len-1024.bin \
--path bart.large.cnn/model.pt \
--fp16 \
--task translation \
--batch-size BATCH_SIZE \
--gen-subset valid \
--truncate-source \
--bpe gpt2 \
--beam 4 \
--num-workers 4 \
--min-len 55 \
--max-len-b 140 \
--no-repeat-ngram-size 3 \
--lenpen 2.0
To get the baseline fairseq's speed number, replace fastseq-generate-for-fairseq
by fairseq-generate
.
Refer to file.
-
CNN daily mail validation data, NVIDIA-V100-16GB
BatchSize 32 64 128 transformers-4.12.0 4.5 samples/s 4.5 samples/s OOM above + fastseq 10.6 samples/s 11.7 samples/s 12.4 samples/s
facebook/bart-large-cnn
from model hub.
CNN/DM validation data
$ fastseq-generate-for-transformers \
facebook/bart-large-cnn \
cnn_dm/val.source \
out.summary \
--reference_path cnn_dm.1k/val.target \
--device cuda \
--bs BATCH_SIZE \
--fp16 \
--score_path out.score \
--task summarization
Baseline speed number is obtained by running Transformers v4.12.0 code.
Refer to file.