Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.703
|View full text |Cite
|
Sign up to set email alerts
|

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Abstract: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and other recent pretraining schemes. We evaluate a number of noising approaches,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

15
3,341
1
4

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 3,552 publications
(3,370 citation statements)
references
References 19 publications
15
3,341
1
4
Order By: Relevance
“…It has achieved superior performance in machine translation task with significantly less training time. Currently, large Transformers [32,133,149], which are pre-trained on a massive text corpus with self-supervised objectives, have achieved superior results in a variety of downstream NLP tasks such as machine understanding [32,84], question-answering [27,86], and abstractive text summarization [34,72,85,112,148,153]. Zhang et al [153] demonstrated that their pre-trained encoder-decoder model can outperform previous state-of-the-art results [28,36,44,63,67,99,100,119,121] on several datasets by fine-tuning with limited supervised examples, which shows that pre-trained models are promising candidates in zero-shot and low-resource summarization tasks.…”
Section: Beyond Rnn-based Seq2seq Modelsmentioning
confidence: 99%
“…It has achieved superior performance in machine translation task with significantly less training time. Currently, large Transformers [32,133,149], which are pre-trained on a massive text corpus with self-supervised objectives, have achieved superior results in a variety of downstream NLP tasks such as machine understanding [32,84], question-answering [27,86], and abstractive text summarization [34,72,85,112,148,153]. Zhang et al [153] demonstrated that their pre-trained encoder-decoder model can outperform previous state-of-the-art results [28,36,44,63,67,99,100,119,121] on several datasets by fine-tuning with limited supervised examples, which shows that pre-trained models are promising candidates in zero-shot and low-resource summarization tasks.…”
Section: Beyond Rnn-based Seq2seq Modelsmentioning
confidence: 99%
“…We use fairseq-py (Ott et al, 2019) to train the QABRIEFER. We use the open-sourced BART model (Lewis et al, 2019) and suggested finetuning hyperparameters, training for 10 epochs and taking the best epoch by validation loss. To generate, we use beam search with beam size 5.…”
Section: Model Detailsmentioning
confidence: 99%
“…To improve the quality of the encoder, we incorporate large-scale pretraining on millions of sequences of AMR by adopting the generative pretraining approach proposed in Lewis et al (2019a). This pretraining incorporates various noise operations, such as masking (Devlin et al, 2019), span masking (Fan et al, 2019a), and shuffling.…”
Section: Encoding English Amrmentioning
confidence: 99%