2020
DOI: 10.48550/arxiv.2004.04100
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…This paper chooses XLNet [11], a widely-used Question & Answering generative natural language model. Due to the large amount of pre-trained data available for XLNet, this paper uses the Kdconv [12] open-source corpus as the fine-tuning dataset for XLNet. However, as the movie and music information in the Kdconv corpus is outdated, additional data is crawled from the Douban Movie website and QQ Music website using Python to supplement the response content of the companion robot.…”
Section: Figure 1: System Architecture Of the Companion Chatbotmentioning
confidence: 99%
“…This paper chooses XLNet [11], a widely-used Question & Answering generative natural language model. Due to the large amount of pre-trained data available for XLNet, this paper uses the Kdconv [12] open-source corpus as the fine-tuning dataset for XLNet. However, as the movie and music information in the Kdconv corpus is outdated, additional data is crawled from the Douban Movie website and QQ Music website using Python to supplement the response content of the companion robot.…”
Section: Figure 1: System Architecture Of the Companion Chatbotmentioning
confidence: 99%
“…It includes 2425, 302, 304 utterances for training, validation and test set, respectively. KdConv contains 4,5000 Chinese dialogues from film, music and travel domains, and an average dialogue turn of 19, with a total of 86,000 sentences [27]. These dialogues include in-depth discussions on relevant topics and natural transitions between multiple topics.…”
Section: Related Workmentioning
confidence: 99%
“…While the user-centered dialog datasets have appeared, datasets and agents that aim to improve the level of knowledge in the answer with additional documents has been in parallel released (Dinan et al 2018;Zhou, Prabhumoye, and Black 2018;Moghe et al 2018;Qin et al 2019;Gopalakrishnan et al 2019;Cho and May 2020;Zhou et al 2020;Santhanam et al 2020). Dinan et al (2018) is a dialog dataset where the agent retrieves the Wikipedia pages on diverse topics and generates responses to the questions.…”
Section: Related Workmentioning
confidence: 99%