KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation

Zhou, Hao; Zheng, Chujie; Huang, Kaili; Huang, Minlie; Zhu, Xiaoyan

doi:10.48550/arxiv.2004.04100

Cited by 7 publications

(7 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This paper chooses XLNet [11], a widely-used Question & Answering generative natural language model. Due to the large amount of pre-trained data available for XLNet, this paper uses the Kdconv [12] open-source corpus as the fine-tuning dataset for XLNet. However, as the movie and music information in the Kdconv corpus is outdated, additional data is crawled from the Douban Movie website and QQ Music website using Python to supplement the response content of the companion robot.…”

Section: Figure 1: System Architecture Of the Companion Chatbotmentioning

confidence: 99%

Design of Artificial Intelligence Companion Chatbot

Chen,

Kang,

2024

JNM

View full text Add to dashboard Cite

With the development of cities and the prevalence of networks, interpersonal relationships have become increasingly distant. When people crave communication, they hope to find someone to confide in. With the rapid advancement of deep learning and big data technologies, an enabling environment has been established for the development of intelligent chatbot systems. By effectively combining cutting-edge technologies with humancentered design principles, chatbots hold the potential to revolutionize our lives and alleviate feelings of loneliness. A multi-topic chat companion robot based on a state machine has been proposed, which can engage in fluent dialogue with humans and meet different functional requirements. It can chat with users about movies, music, and other related topics, and recommend movies and music that may interest them to alleviate their loneliness and provide companionship. The interaction platform of the companion robot is realized through the QQ communication platform, with two chat modes: Conversation mode and recommendation mode. First, the KdConv open-source corpus was selected, and Python was used to crawl information on movies and music from Douban and QQ Music to establish and pre-process the dataset. Then, the dialogue function was implemented using generative language models and retrieval systems, while the recommendation function was achieved using user profiling and collaborative filtering. Finally, a state machine algorithm was used to achieve real-time switching between the two chat modes of the companion robot. In conclusion, test participants gave high ratings for the accuracy of the companion robot's responses and the satisfaction with its content recommendations. Compared to traditional large-scale integrated models, this robot employs a state-machine framework to achieve diverse functions through seamless state transitions, thereby enhancing computational speed and precision. Additionally, the robot can recommend movies and music, providing companionship and alleviating loneliness for users, which is of great significance in modern society where interpersonal relationships are increasingly alienated.

show abstract

Section: Figure 1: System Architecture Of the Companion Chatbotmentioning

confidence: 99%

Design of Artificial Intelligence Companion Chatbot

Chen,

Kang,

2024

JNM

View full text Add to dashboard Cite

show abstract

“…It includes 2425, 302, 304 utterances for training, validation and test set, respectively. KdConv contains 4,5000 Chinese dialogues from film, music and travel domains, and an average dialogue turn of 19, with a total of 86,000 sentences [27]. These dialogues include in-depth discussions on relevant topics and natural transitions between multiple topics.…”

Section: Related Workmentioning

confidence: 99%

IRWoZ: Constructing an Industrial Robot Wizard-of-OZ Dialoguing Dataset

Chrysostomou

Zhang

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Enabling a flexible and natural human-robot interaction (HRI) for industrial robots is a critical yet challenging task that can be facilitated by the use of conversational artificial intelligence (AI). Prior research has concentrated on strengthening interactions through the deployment of social robots, while disregarding the capabilities required to boost the flexibility and user experience associated with human-robot collaboration (HRC) on manufacturing tasks. One of the main challenges is the lack of publicly available industrial-oriented dialogue datasets for the training of conversational AI. In this work, we present an industrial robot wizard-of-Oz dialoguing dataste (IRWoZ) focused on enabling HRC in manufacturing tasks. The dataset covers four domains: assembly, transportation, position, and relocation. It is created using the Wizard-of-Oz technique to be less noisy. We manually constructed, annotated and validated dialogue segments (e.g., intentions, slots, annotations), as well as the responses. Building upon the proposed dataset, we benchmark it on the state-of-the-art (SoTA) language models, generative pretrained (GPT-2) models, on dialogue state tracking and response generation tasks. We expect that the IRWoZ dataset will facilitate exciting ongoing dialogue research and we provide it freely accessible at https://github.com/lcroy/ToD4IR/tree/main/dataset.INDEX TERMS data collection, data annotation, dialogue systems, virtual assistants, human-robot interaction.

show abstract

“…While the user-centered dialog datasets have appeared, datasets and agents that aim to improve the level of knowledge in the answer with additional documents has been in parallel released (Dinan et al 2018;Zhou, Prabhumoye, and Black 2018;Moghe et al 2018;Qin et al 2019;Gopalakrishnan et al 2019;Cho and May 2020;Zhou et al 2020;Santhanam et al 2020). Dinan et al (2018) is a dialog dataset where the agent retrieves the Wikipedia pages on diverse topics and generates responses to the questions.…”

Section: Related Workmentioning

confidence: 99%

Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge

Jang¹,

Lim²,

Hur³

et al. 2022

AAAI

View full text Add to dashboard Cite

Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a call For Customized conversation (FoCus) dataset where the customized answers are built with the user's persona and Wikipedia knowledge. To evaluate the abilities to make informative and customized utterances of pre-trained language models, we utilize BART and GPT-2 as well as transformer-based models. We assess their generation abilities with automatic scores and conduct human evaluations for qualitative results. We examine whether the model reflects adequate persona and knowledge with our proposed two sub-tasks, persona grounding (PG) and knowledge grounding (KG). Moreover, we show that the utterances of our data are constructed with the proper knowledge and persona through grounding quality assessment.

show abstract

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation

Cited by 7 publications

References 27 publications

Design of Artificial Intelligence Companion Chatbot

Design of Artificial Intelligence Companion Chatbot

IRWoZ: Constructing an Industrial Robot Wizard-of-OZ Dialoguing Dataset

Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge

Contact Info

Product

Resources

About