MultiWOZ 2.3: A multi-domain task-oriented dialogue dataset enhanced with annotation corrections and co-reference annotation

Han, Ting; Liu, Ximing; Takanobu, Ryuichi; Lian, Yixin; Huang, Chongxuan; Wan, Dazhen; Peng, Wei; Huang, Minlie

doi:10.48550/arxiv.2010.05594

Cited by 15 publications

(25 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dialogue state tracking (DST) refers to the task of predicting a formal state of a dialogue at its current turn, as a set of slot-value pairs at every turn. State-of-the-art approaches apply large transformer networks (Peng et al, 2020;Hosseini-Asl et al, 2020) to encode the full dialogue history in order to predict slot values. Other approaches include question-answering models , ontology matching in the finite case , or pointer-generator networks (Wu et al, 2019).…”

Section: Dialogue State Trackingmentioning

confidence: 99%

See 1 more Smart Citation

Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Moradshahi¹,

Tsai²,

Campagna³

et al. 2021

Preprint

View full text Add to dashboard Cite

Robust state tracking for task-oriented dialogue systems currently remains restricted to a few popular languages. This paper shows that given a large-scale dialogue data set in one language, we can automatically produce an effective semantic parser for other languages using machine translation. We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values and eliminate costly human supervision used in previous benchmarks. We also propose a new contextual semantic parsing model, which encodes the formal slots and values, and only the last agent and user utterances. We show that the succinct representation reduces the compounding effect of translation errors, without harming the accuracy in practice.We evaluate our approach on several dialogue state tracking benchmarks. On RiSAWOZ, CrossWOZ, CrossWOZ-EN, and MultiWOZ-ZH datasets we improve the state of the art by 11%, 17%, 20%, and 0.3% in joint goal accuracy. We present a comprehensive error analysis for all three datasets showing erroneous annotations can obscure judgments on the quality of the model. Finally, we present RiSAWOZ English and German datasets, created using our translation methodology. On these datasets, accuracy is within 11% of the original showing that high-accuracy multilingual dialogue datasets are possible without relying on expensive human annotations.

show abstract

Section: Dialogue State Trackingmentioning

confidence: 99%

“…Following prior work with this dataset , we drop hospital and police from the training set as they are not included in the validation and test set. After the release of Multi-WOZ 2.0 (Budzianowski et al, 2018), later iterations (Eric et al, 2019;Zang et al, 2020;Han et al, 2020) corrected some of the misannotations.…”

Section: Datasetsmentioning

confidence: 99%

Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Moradshahi¹,

Tsai²,

Campagna³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Annotation error Even the recent versions of MultiWOZ still have incorrect labels and inconsistent annotations [3,21,4,20,5]. These noises are the primary reason why it is challenging to accurately evaluate the model performance.…”

Section: Data Limitationmentioning

confidence: 99%

“…Yes I would like it made for wednesday for 7 people at -PriceRange (Expensive) Whereas the MultiWOZ has been used as a standard benchmark dataset for DST, there has been an increasing number of recent studies reporting the concerns regarding the inherent limitations of this dataset. First, newer versions of MultiWOZ have been proposed to address certain issues such as annotation errors, typos, standardization, annotation consistency, and other factors [3,21,4,20]. In addition, Qian et al [12] pointed out an entity bias issue, i.e., only a small number of values in the ontology account for the majority of labels.…”

Section: Introductionmentioning

confidence: 99%

Oh My Mistake!: Toward Realistic Dialogue State Tracking including Turnback Utterances

Takyoung¹,

Lee²,

Yoon³

et al. 2021

Preprint

View full text Add to dashboard Cite

The primary purpose of dialogue state tracking (DST), a critical component of an end-to-end conversational system, is to build a model that responds well to real-world situations. Although we often change our minds during ordinary conversations, current benchmark datasets do not adequately reflect such occurrences and instead consist of over-simplified conversations, in which no one changes their mind during a conversation. As the main question inspiring the present study,"Are current benchmark datasets sufficiently diverse to handle casual conversations in which one changes their mind?" We found that the answer is "No" because simply injecting template-based turnback utterances significantly degrades the DST model performance. The test joint goal accuracy on the MultiWOZ decreased by over 5%p when the simplest form of turnback utterance was injected. Moreover, the performance degeneration worsens when facing more complicated turnback situations. However, we also observed that the performance rebounds when a turnback is appropriately included in the training dataset, implying that the problem is not with the DST models but rather with the construction of the benchmark dataset.Preprint. Under review.

show abstract

“…As a matter of fact, massive efforts have already been made to further improve the annotation quality of MultiWOZ 2.1, resulting in MultiWOZ 2.2 and MultiWOZ 2.3 (Han et al, 2020b). Nonetheless, they both have some limitations.…”

Section: Introductionmentioning

confidence: 99%

MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with Essential Annotation Corrections to Improve State Tracking Evaluation

Ye¹,

Manotumruksa²,

Yılmaz

2021

Preprint

View full text Add to dashboard Cite

The MultiWOZ 2.0 dataset was released in 2018. It consists of more than 10,000 taskoriented dialogues spanning 7 domains, and has greatly stimulated the research of taskoriented dialogue systems. However, there is substantial noise in the state annotations, which hinders a proper evaluation of dialogue state tracking models. To tackle this issue, massive efforts have been devoted to correcting the annotations, resulting in 3 improved versions of this dataset (i.e., MultiWOZ 2.1-2.3). Even so, there are still lots of incorrect and inconsistent annotations. This work introduces MultiWOZ 2.4 1 , in which we refine all annotations in the validation set and test set on top of MultiWOZ 2.1. The annotations in the training set remain unchanged to encourage robust and noise-resilient model training. We further benchmark 8 state-of-the-art dialogue state tracking models. All these models achieve much higher performance on Mul-tiWOZ 2.4 than on MultiWOZ 2.1.

show abstract

MultiWOZ 2.3: A multi-domain task-oriented dialogue dataset enhanced with annotation corrections and co-reference annotation

Cited by 15 publications

References 27 publications

Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Oh My Mistake!: Toward Realistic Dialogue State Tracking including Turnback Utterances

MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with Essential Annotation Corrections to Improve State Tracking Evaluation

Contact Info

Product

Resources

About