MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

Budzianowski, Paweł; Wen, Tao; Tseng, Bo-Hsiang; Casanueva, Iñigo; Ultes, Stefan; Ramadan, Osman; Gašić, Milica

doi:10.18653/v1/d18-1547

Cited by 739 publications

(1,070 citation statements)

References 43 publications

Supporting

Mentioning

920

Contrasting

Unclassified

Order By: Relevance

“…However, this tool doesn't allow users to specify custom annotations or labels and doesn't support classification or slot-value annotation. This is not compatible with modern dialogue datasets which require such annotations (Budzianowski et al, 2018). INCEpTION (Klie et al, 2018) is a semantic annotation platform for interactive tasks that require semantic resources like entity linking.…”

Section: Main Contributionsmentioning

confidence: 99%

“…1 https://github.com/Wluper/lida Creating a high-quality dialogue dataset incurs a large annotation cost, which makes good dialogue annotation tools essential to ensure the highest possible quality. Many annotation tools exist for a range of NLP tasks but none are designed specifically for dialogue with modern usability principles in mind -in collecting MultiWOZ, for example, Budzianowski et al (2018) had to create a bespoke annotation interface.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

LIDA: Lightweight Interactive Dialogue Annotator

Collins

Rozanov²,

Zhang³

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Dialogue systems have the potential to change how people interact with machines but are highly dependent on the quality of the data used to train them. It is therefore important to develop good dialogue annotation tools which can improve the speed and quality of dialogue data annotation. With this in mind, we introduce LIDA, an annotation tool designed specifically for conversation data. As far as we know, LIDA is the first dialogue annotation system that handles the entire dialogue annotation pipeline from raw text, as may be the output of transcription services, to structured conversation data. Furthermore it supports the integration of arbitrary machine learning models as annotation recommenders and also has a dedicated interface to resolve inter-annotator disagreements such as after crowdsourcing annotations for a dataset. LIDA is fully open source, documented and publicly available 1 .

show abstract

Section: Main Contributionsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

LIDA: Lightweight Interactive Dialogue Annotator

Collins

Rozanov²,

Zhang³

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…For the second set of experiments, we evaluate the proposed model and TL approaches on the multi-turn Google Simulated Dialogues (GSD) 3 [7]. We explore Microsoft Dialogue Challenge (MDC) 4 [31] and MultiWOZ 2.0 (WOZ) 5 [32] datasets as other dialogue corpora for evaluating the proposed TL approaches. We use the same data division as [7].…”

Section: Datamentioning

confidence: 99%

Transfer Learning for Context-Aware Spoken Language Understanding

Qian

Zhu

Wang

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

Spoken language understanding (SLU) is a key component of task-oriented dialogue systems. SLU parses natural language user utterances into semantic frames. Previous work has shown that incorporating context information significantly improves SLU performance for multi-turn dialogues. However, collecting a large-scale human-labeled multi-turn dialogue corpus for the target domains is complex and costly. To reduce dependency on the collection and annotation effort, we propose a Context Encoding Language Transformer (CELT) model facilitating exploiting various context information for SLU. We explore different transfer learning approaches to reduce dependency on data collection and annotation. In addition to unsupervised pre-training using large-scale general purpose unlabeled corpora, such as Wikipedia, we explore unsupervised and supervised adaptive training approaches for transfer learning to benefit from other in-domain and out-ofdomain dialogue corpora. Experimental results demonstrate that the proposed model with the proposed transfer learning approaches achieves significant improvement on the SLU performance over state-of-the-art models on two large-scale single-turn dialogue benchmarks and one large-scale multiturn dialogue benchmark.

show abstract

“…Task-oriented dialogue systems are primarily designed to search and interact with large databases which contain information pertaining to a certain dialogue domain: the main purpose of such systems is to assist the users in accomplishing a welldefined task such as flight booking (El Asri et al, 2017), tourist information (Henderson et al, 2014), restaurant search (Williams, 2012), or booking a taxi (Budzianowski et al, 2018). These systems are typically constructed around rigid task-specific ontologies (Henderson et al, 2014;Mrkšić et al, 2015) which enumerate the constraints the users can express using a collection of slots (e.g., PRICE RANGE for restaurant search) and their slot values (e.g., CHEAP, EXPENSIVE for the aforementioned slots).…”

Section: Introductionmentioning

confidence: 99%

PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking

Henderson

Vulić

Casanueva

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

View full text Add to dashboard Cite

We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-oriented dialogue systems and the use of explicit semantics in the form of task-specific ontologies. The PolyResponse engine is trained on hundreds of millions of examples extracted from real conversations: it learns what responses are appropriate in different conversational contexts. It then ranks a large index of text and visual responses according to their similarity to the given context, and narrows down the list of relevant entities during the multi-turn conversation. We introduce a restaurant search and booking system powered by the PolyResponse engine, currently available in 8 different languages.

show abstract

MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

Cited by 739 publications

References 43 publications

LIDA: Lightweight Interactive Dialogue Annotator

LIDA: Lightweight Interactive Dialogue Annotator

Transfer Learning for Context-Aware Spoken Language Understanding

PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking

Contact Info

Product

Resources

About