Multilingual Denoising Pre-training for Neural Machine Translation

Liu, Yinhan; Gu, Jiatao; Goyal, Naman; Li, Xian; Edunov, Sergey; Ghazvininejad, Marjan; Lewis, Mike; Zettlemoyer, Luke

doi:10.48550/arxiv.2001.08210

Cited by 251 publications

(127 citation statements)

References 34 publications

Supporting

Mentioning

127

Contrasting

Order By: Relevance

“…However, the success of the above DL-based methods rely heavily on large-scale datasets, posing a challenge for supervised and cross-domain text generation tasks. Since 2018, large-scale pretrained Language models (PLMs) such as BERT [Devlin et al 2018], RoBERTa , GPT , T5 [Raffel et al 2019] and mBART [Liu et al 2020a], have gradually become a new paradigm of NLP. Owing to its use of large corpus and unsupervised learning based on the Transformer structure, PLMs are believed to have learned a great deal of semantic and syntactical knowledge from the data, and only a fine-tuning is required for downstream tasks to get the state-of-the-art (SOTA) performance.…”

Section: Ai Chatbot Story Generationmentioning

confidence: 99%

“…Seq2seq Models: The seq2seq models use both encoder and decoder of the Transformer, for a better model flexibility. Currently, the most representative models of this type include T5 [Raffel et al 2019] and mBART [Liu et al 2020a]. In principle almost all pre-trained tasks used in AE and AR models can be adapted to the seq2seq models.…”

Section: Output: Story Paragraphmentioning

confidence: 99%

“…It uses the BERT [Devlin et al 2018] model to automatically construct a content plan, including keyword assignments and their corresponding sentence-level positions. After that, the BART [Liu et al 2020a] without structure modification is applied to fill the masked tokens appearing in the generated text template. Finally, an iterative refinement algorithm that works within the sequenceto-sequence (seq2seq) models is designed to improve generation quality with flexible editing.…”

Section: Post-processingmentioning

confidence: 99%

See 2 more Smart Citations

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

Zhang¹,

Song²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that are more natural and better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the lower level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks which may require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize CTG techniques from the perspective of PLMs. We hope it can help researchers in related fields to quickly track the academic frontier, providing them with a landscape of the area and a roadmap for future research.

show abstract

Section: Ai Chatbot Story Generationmentioning

confidence: 99%

Section: Output: Story Paragraphmentioning

confidence: 99%

Section: Post-processingmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

Zhang¹,

Song²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…3 We take the idea of adding a special token id/tag to each input sentence in our mid-tuning dataset, from Gao et al (2020) and Liu et al (2020). It helps the model to differentiate between sentence sa and semantic form r).…”

Section: Learning and Aligning Encodersmentioning

confidence: 99%

Transferring Semantic Knowledge Into Language Encoders

Umair¹,

Ferraro²

2021

Preprint

View full text Add to dashboard Cite

We introduce semantic form mid-tuning, an approach for transferring semantic knowledge from semantic meaning representations into transformer-based language encoders. In mid-tuning, we learn to align the text of general sentencesnot tied to any particular inference task-and structured semantic representations of those sentences. Our approach does not require gold annotated semantic representations. Instead, it makes use of automatically generated semantic representations, such as from off-the-shelf PropBank and FrameNet semantic parsers. We show that this alignment can be learned implicitly via classification or directly via triplet loss. Our method yields language encoders that demonstrate improved predictive performance across inference, reading comprehension, textual similarity, and other semantic tasks drawn from the GLUE, SuperGLUE, and SentEval benchmarks. We evaluate our approach on three popular baseline models, where our experimental results and analysis concludes that current pre-trained language models can further benefit from structured semantic frames with the proposed mid-tuning method, as they inject additional task-agnostic knowledge to the encoder, improving the generated embeddings as well as the linguistic properties of the given model, as evident from improvements on a popular sentence embedding toolkit and a variety of probing tasks.

show abstract

“…Zhang et al (2020) suggest that the performance degradation results from limited multilingual NMT model capacity. Some research overcame such degradation by fine-tuning the whole model on the bilingual corpus (Neubig and Hu, 2018;Conneau and Lample, 2019;Liu et al, 2020). However, finetuning the whole model is parameter inefficient: it consumes a large amount of storage to archive separate models for different translation pairs.…”

Section: Introductionmentioning

confidence: 99%

Counter-Interference Adapter for Multilingual Machine Translation

Zhu¹,

Feng²,

Zhao³

et al. 2021

Preprint

View full text Add to dashboard Cite

Developing a unified multilingual translation model is a key topic in machine translation research. However, existing approaches suffer from performance degradation: multilingual models yield inferior performance compared to the ones trained separately on rich bilingual data. We attribute the performance degradation to two issues: multilingual embedding conflation and multilingual fusion effects. To address the two issues, we propose PAM, a Transformer model augmented with defusion adaptation for multilingual machine translation. Specifically, PAM consists of embedding and layer adapters to shift the word and intermediate representations towards languagespecific ones. Extensive experiment results on IWSLT, OPUS-100, and WMT benchmarks show that PAM outperforms several strong competitors, including series adapter and multilingual knowledge distillation.

show abstract

Multilingual Denoising Pre-training for Neural Machine Translation

Cited by 251 publications

References 34 publications

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

Transferring Semantic Knowledge Into Language Encoders

Counter-Interference Adapter for Multilingual Machine Translation

Contact Info

Product

Resources

About