Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations 2020
DOI: 10.18653/v1/2020.acl-demos.30
|View full text |Cite
|
Sign up to set email alerts
|

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

Abstract: We present a large, tunable neural conversational response generation model, DIALOGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more releva… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
574
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 694 publications
(687 citation statements)
references
References 23 publications
6
574
0
2
Order By: Relevance
“…Qi et al (2018) investigated the application of pre-trained word embeddings for MT; Ramachandran et al (2017) proposed to pre-train the encoder-decoder modules as two separate language models. Yang et al (2019a); Zhu et al (2020) explored fusion approaches to incorporate the pre-trained BERT weights to improve NMT training. In contrast to most prior work, we focus on pre-training one denoising autoencoder, and adapt the weights of the entire model for various MT applications.…”
Section: Related Workmentioning
confidence: 99%
“…Qi et al (2018) investigated the application of pre-trained word embeddings for MT; Ramachandran et al (2017) proposed to pre-train the encoder-decoder modules as two separate language models. Yang et al (2019a); Zhu et al (2020) explored fusion approaches to incorporate the pre-trained BERT weights to improve NMT training. In contrast to most prior work, we focus on pre-training one denoising autoencoder, and adapt the weights of the entire model for various MT applications.…”
Section: Related Workmentioning
confidence: 99%
“…They proved that the pretraining model can be utilized in open-dialogue generation task. Some works employ pretraining models to construct dialogue systems, such as DialoGPT [59], Blender [60], Meena [61], and Plato-2 [62]. However, due to the huge cost of training a pretraining model for open-domain dialogue generation task, it is not suitable for the individual researchers.…”
Section: Pretraining-model-based Methodsmentioning
confidence: 99%
“…DialoGPT: Zhang et al [59] proposed a large tunable dialogue model named DialoGPT that based on the GPT-2 model [63]. They also introduced the Maximum Mutual Information (MMI) to address the dull responses problem.…”
Section: Pretraining-model-based Methodsmentioning
confidence: 99%
“…DialoGPT (Zhang et al, 2019) is a GPT-2 model pretrained on English Reddit dialogues. The dataset is extracted from comment chains in Reddit from 2005 till 2017, comprising 147,116,725 dialogue instances with 1.8 billion tokens.…”
Section: Methodsmentioning
confidence: 99%
“…On these two datasets, we train several dialogue generation models based on Transformer (Vaswani et al, 2017), GPT (Radford et al, a; Zhang et al, 2019), and BERT-GPT (Wu et al, 2019; Lewis et al, 2019). Transformer is an encoder and decoder architecture which takes the conversation history as inputs and generates the response.…”
Section: Introductionmentioning
confidence: 99%