DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

Zhang, Yizhe; Sun, Siqi; Galley, Michel; Chen, Yen-Chun; Brockett, Chris; Gao, Xiang; Gao, Jianfeng; Liu, Jingjing; Dolan, Bill

doi:10.18653/v1/2020.acl-demos.30

Cited by 694 publications

(687 citation statements)

References 23 publications

Supporting

Mentioning

574

Contrasting

Unclassified

Order By: Relevance

“…Qi et al (2018) investigated the application of pre-trained word embeddings for MT; Ramachandran et al (2017) proposed to pre-train the encoder-decoder modules as two separate language models. Yang et al (2019a); Zhu et al (2020) explored fusion approaches to incorporate the pre-trained BERT weights to improve NMT training. In contrast to most prior work, we focus on pre-training one denoising autoencoder, and adapt the weights of the entire model for various MT applications.…”

Section: Related Workmentioning

confidence: 99%

Multilingual Denoising Pre-training for Neural Machine Translation

Liu

Goyal

et al. 2020

Transactions of the Association for Computational Linguistics

688

595

View full text Add to dashboard Cite

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al., 2019 ). mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, whereas previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine-tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task- specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show that it enables transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Multilingual Denoising Pre-training for Neural Machine Translation

Liu

Goyal

et al. 2020

Transactions of the Association for Computational Linguistics

688

595

View full text Add to dashboard Cite

show abstract

“…They proved that the pretraining model can be utilized in open-dialogue generation task. Some works employ pretraining models to construct dialogue systems, such as DialoGPT [59], Blender [60], Meena [61], and Plato-2 [62]. However, due to the huge cost of training a pretraining model for open-domain dialogue generation task, it is not suitable for the individual researchers.…”

Section: Pretraining-model-based Methodsmentioning

confidence: 99%

“…DialoGPT: Zhang et al [59] proposed a large tunable dialogue model named DialoGPT that based on the GPT-2 model [63]. They also introduced the Maximum Mutual Information (MMI) to address the dull responses problem.…”

Section: Pretraining-model-based Methodsmentioning

confidence: 99%

Neural Dialogue Generation Methods in Open Domain: A Survey

Sun¹,

Li²

2021

NLPRE

View full text Add to dashboard Cite

Open-Domain Dialogue Generation (human-computer interaction) is an important issue in the field of Natural Language Processing (NLP). Because of the improvement of deep learning techniques, a large number of neural dialogue generative methods were proposed to generate better responses. In this survey, we elaborated the research history of these existing generative methods, and then roughly divided them into six categories, i.e., Encoder-Decoder framework-based methods, Hierarchical Recurrent Encoder-Decoder (HRED)-based methods, Variational Autoencoder (VAE)-based methods, Reinforcement Learning (RL)-based methods, Generative Adversarial Network (GAN)-based methods, and pretraining-model-based methods. We dived into the methods of each category and gave the detailed discussions of these methods. After that, we presented a comparison among the different categories of methods and analyzed their advantages and disadvantages. We enumerated some open access public datasets and some commonly used automatic evaluating metrics. Finally, we discuss some possible research directions that can take the research of neural dialogue generation into a new frontier in the future.

show abstract

“…DialoGPT (Zhang et al, 2019) is a GPT-2 model pretrained on English Reddit dialogues. The dataset is extracted from comment chains in Reddit from 2005 till 2017, comprising 147,116,725 dialogue instances with 1.8 billion tokens.…”

Section: Methodsmentioning

confidence: 99%

“…On these two datasets, we train several dialogue generation models based on Transformer (Vaswani et al, 2017), GPT (Radford et al, a; Zhang et al, 2019), and BERT-GPT (Wu et al, 2019; Lewis et al, 2019). Transformer is an encoder and decoder architecture which takes the conversation history as inputs and generates the response.…”

Section: Introductionmentioning

confidence: 99%

On the Generation of Medical Dialogues for COVID-19

Yang

Zeng

Tan³

et al. 2020

Preprint

View full text Add to dashboard Cite

Under the pandemic of COVID-19, people experiencing COVID19-related symptoms or exposed to risk factors have a pressing need to consult doctors. Due to hospital closure, a lot of consulting services have been moved online. Because of the shortage of medical professionals, many people cannot receive online consultations timely. To address this problem, we aim to develop a medical dialogue system that can provide COVID19-related consultations. We collected two dialogue datasets -CovidDialog -(in English and Chinese respectively) containing conversations between doctors and patients about COVID-19. On these two datasets, we train several dialogue generation models based on Transformer, GPT, and BERT-GPT. Since the two COVID-19 dialogue datasets are small in size, which bear high risk of overfitting, we leverage transfer learning to mitigate data deficiency. Specifically, we take the pretrained models of Transformer, GPT, and BERT-GPT on dialog datasets and other large-scale texts, then finetune them on our CovidDialog datasets. Experiments demonstrate that these approaches are promising in generating meaningful medical dialogues about COVID-19. But more advanced approaches are needed to build a fully useful dialogue system that can offer accurate COVID-related consultations. The data and code are available at https://github.com/UCSD-AI4H/COVID-Dialogue

show abstract

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

Cited by 694 publications

References 23 publications

Multilingual Denoising Pre-training for Neural Machine Translation

Multilingual Denoising Pre-training for Neural Machine Translation

Neural Dialogue Generation Methods in Open Domain: A Survey

On the Generation of Medical Dialogues for COVID-19

Contact Info

Product

Resources

About