CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

Zhu, Qi; Huang, Kaili; Zhang, Zheng; Zhu, Xiaoyan; Huang, Minlie

doi:10.1162/tacl_a_00314

Cited by 80 publications

(78 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These datasets have higher language variation and task complexity. While most datasets are in English, Zhu et al [72] propose the first large-scale Chinese task-oriented dataset with rich annotations to facilitate the research of Chinese and cross-lingual dialog modeling. An incomplete survey on these dialog datasets is presented in Table 2.…”

Section: Corporamentioning

confidence: 99%

Recent advances and challenges in task-oriented dialog systems

Zhang

Takanobu

Huang

et al. 2020

Sci. China Technol. Sci.

Self Cite

120

View full text Add to dashboard Cite

Due to the significance and value in human-computer interaction and natural language processing, task-oriented dialog systems are attracting more and more attention in both academic and industrial communities. In this paper, we survey recent advances and challenges in task-oriented dialog systems. We also discuss three critical topics for task-oriented dialog systems: (1) improving data efficiency to facilitate dialog modeling in low-resource settings, (2) modeling multi-turn dynamics for dialog policy learning to achieve better task-completion performance, and (3) integrating domain ontology knowledge into the dialog model. Besides, we review the recent progresses in dialog evaluation and some widely-used corpora. We believe that this survey, though incomplete, can shed a light on future research in task-oriented dialog systems. task-oriented dialog systems, natural language understanding, dialog policy, dialog state tracking, natural language generation

show abstract

Section: Corporamentioning

confidence: 99%

Recent advances and challenges in task-oriented dialog systems

Zhang

Takanobu

Huang

et al. 2020

Sci. China Technol. Sci.

Self Cite

120

View full text Add to dashboard Cite

show abstract

“…ATIS (Hemphill et al, 1990), WOZ 2.0 (Wen et al, 2017), FRAMES (El Asri et al, 2017) and KVRET (Eric et al, 2017) are small-scale datasets built in this way. In contrast, MultiWOZ Budzianowski et al (2018) and Cross-WOZ (Zhu et al, 2020) are two large-scale H2H datasets proposed recently.…”

Section: Related Workmentioning

confidence: 99%

“…In recent years, we have witnessed that a variety of datasets tailored for task-oriented dialogue have been constructed, such as MultiWOZ (Budzianowski et al, 2018), SGD (Rastogi et al, 2019a) and CrossWOZ (Zhu et al, 2020), along with the increasing interest in conversational AI in both academia and industry (Gao et al, 2018). These datasets have triggered extensive research in either end-to-end or traditional modular taskoriented dialogue modeling (Wen et al, 2019;Dai et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

“…MultiWOZ (Budzianowski et al, 2018), probably the most promising and notable dialogue corpus collected in a Wizard-of-Oz (i.e., Human-to-Human) way recently, is one order of magnitude larger than the aforementioned corpora collected in the same way. However, it contains noisy systemside state annotations and lacks user-side dialogue acts 2 (Eric et al, 2019;Zhu et al, 2020). Yet another very recent dataset CrossWOZ (Zhu et al, 2020), the first large-scale Chinese H2H dataset for task-oriented dialogue, provides semantic annotations on both user and system side although it is relatively smaller than MultiWOZ.…”

Section: Introductionmentioning

confidence: 99%

“…However, it contains noisy systemside state annotations and lacks user-side dialogue acts 2 (Eric et al, 2019;Zhu et al, 2020). Yet another very recent dataset CrossWOZ (Zhu et al, 2020), the first large-scale Chinese H2H dataset for task-oriented dialogue, provides semantic annotations on both user and system side although it is relatively smaller than MultiWOZ. The number of domains in both MultiWOZ and CrossWOZ is fewer than 10.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling

Quan

Zhang²,

Cao³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

In order to alleviate the shortage of multidomain data and to capture discourse phenomena for task-oriented dialogue modeling, we propose RiSAWOZ, a large-scale multidomain Chinese Wizard-of-Oz dataset with Rich Semantic Annotations. RiSAWOZ contains 11.2K human-to-human (H2H) multiturn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains, which is larger than all previous annotated H2H conversational datasets. Both single-and multi-domain dialogues are constructed, accounting for 65% and 35%, respectively. Each dialogue is labeled with comprehensive dialogue annotations, including dialogue goal in the form of natural language description, domain, dialogue states and acts at both the user and system side. In addition to traditional dialogue annotations, we especially provide linguistic annotations on discourse phenomena, e.g., ellipsis and coreference, in dialogues, which are useful for dialogue coreference and ellipsis resolution tasks. Apart from the fully annotated dataset, we also present a detailed description of the data collection procedure, statistics and analysis of the dataset. A series of benchmark models and results are reported, including natural language understanding (intent detection & slot filling), dialogue state tracking and dialogue contextto-text generation, as well as coreference and ellipsis resolution, which facilitate the baseline comparison for future research on this corpus. 1

show abstract