More Diverse Dialogue Datasets via Diversity-Informed Data Collection

Stasaski, Katherine; Yang, Grace Hui; Hearst, Marti A.

doi:10.18653/v1/2020.acl-main.446

Cited by 14 publications

(7 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other past work has incorporated diversity metrics into the dialogue dataset creation pipeline. Stasaski et al (2020) propose a method which measures the diversity of a crowdworker's contributions compared to a corpus, using that information to determine when to stop collecting data from the worker. This results in a more diverse dataset.…”

Section: Diversity Metricsmentioning

confidence: 99%

Semantic Diversity in Dialogue with Natural Language Inference

Stasaski¹,

Hearst²

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Generating diverse, interesting responses to chitchat conversations is a problem for neural conversational agents. This paper makes two substantial contributions to improving diversity in dialogue generation. First, we propose a novel metric which uses Natural Language Inference (NLI) to measure the semantic diversity of a set of model responses for a conversation. We evaluate this metric using an established framework (Tevet and Berant, 2021) and find strong evidence indicating NLI Diversity is correlated with semantic diversity. Specifically, we show that the contradiction relation is more useful than the neutral relation for measuring this diversity and that incorporating the NLI model's confidence achieves state-of-the-art results. Second, we demonstrate how to iteratively improve the semantic diversity of a sampled set of responses via a new generation procedure called Diversity Threshold Generation, which results in an average 137% increase in NLI Diversity compared to standard generation procedures.

show abstract

Section: Diversity Metricsmentioning

confidence: 99%

Semantic Diversity in Dialogue with Natural Language Inference

Stasaski¹,

Hearst²

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

show abstract

“…Augmented with human feedback data, Gao et al (2020b) proposed that the generated responses could be reranked via a response ranking framework trained on the human feedback data and responses with higher quality including diversity were selected. Stasaski et al (2020) proposed to change the data collection pipeline by iteratively computing the diversity of responses from different human participants in dataset construction and selected those participants who tend to generate informative and diverse responses.…”

Section: Response Diversitymentioning

confidence: 99%

Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey

Ni¹,

Young²,

Pandelea³

et al. 2021

Preprint

View full text Add to dashboard Cite

Dialogue systems are a popular Natural Language Processing (NLP) task as it is promising in real-life applications. It is also a complicated task since many NLP tasks deserving study are involved. As a result, a multitude of novel works on this task are carried out, and most of them are deep learning based due to the outstanding performance. In this survey, we mainly focus on the deep learning based dialogue systems. We comprehensively review state-of-the-art research outcomes in dialogue systems and analyze them from two angles: model type and system type. Specifically, from the angle of model type, we discuss the principles, characteristics, and applications of different models that are widely used in dialogue systems. This will help researchers acquaint these models and see how they are applied in state-of-the-art frameworks, which is rather helpful when designing a new dialogue system. From the angle of system type, we discuss task-oriented and open-domain dialogue systems as two streams of research, providing insight into the hot topics related. Furthermore, we comprehensively review the evaluation methods and datasets for dialogue systems to pave the way for future research. Finally, some possible research trends are identified based on the recent research outcomes. To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present in the area of dialogue systems and dialogue-related tasks, extensively covering the popular frameworks, topics, and datasets 1 . * Equal contribution ‡ Corresponding author 1 The frameworks, topics, and datasets discussed are originated from the extensive literature review of state-of-the-art research. We have tried our best to cover all but may still omit some works. Readers are welcome to provide suggestions regarding the omissions and mistakes in this article. We also intend to update this article with time as and when new approaches or definitions are proposed and used by the community Preprint. Under review.

show abstract

“…Increasing dialogue diversity is a long-lasting research interest. Dialogue diversity can be improved via enforcing diversity objective functions (such as maximize mutual information) in neural models (Li et al, 2016a;Baheti et al, 2018), perturbing language rules (Niu and Bansal, 2019) or environment parameters Ruiz et al, 2019), randomizing trajectory synthesis (Andrychowicz et al, 2017;Lu et al, 2019), selecting more diverse data contributors (Stasaski et al, 2020), and sampling trajectories from a diverse set of environments (Chua et al, 2018;Janner et al, 2019). For instance, Campagna et al augmented dialogue data using domain-independent transition rules and domain-specific ontology (Campagna et al, 2020).…”

Section: Diversification In Dialoguesmentioning

confidence: 99%

“…Although the uses are slightly different, ideas to improve diversification can be universal. Dialogue diversity can be improved via i) enforcing diversity in objective functions (such as maximize mutual information) of neural models (Li et al, 2016a;Baheti et al, 2018), ii) perturbing language rules (Niu and Bansal, 2019) or environment parameters Ruiz et al, 2019), iii) randomizing trajectory synthesis (Andrychowicz et al, 2017;Lu et al, 2019), iv) selecting more diverse data contributors (Stasaski et al, 2020), and v) sampling trajectories from a diverse set of environments (Chua et al, 2018;Janner et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

High-Quality Dialogue Diversification by Intermittent Short Extension Ensembles

Tang¹,

Kulkarni²,

Yang³

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Self Cite

View full text Add to dashboard Cite

Many task-oriented dialogue systems use deep reinforcement learning (DRL) to learn policies that respond to the user appropriately and complete the tasks successfully. Training DRL agents with diverse dialogue trajectories prepare them well for rare user requests and unseen situations. One effective diversification method is to let the agent interact with a diverse set of learned user models. However, trajectories created by these artificial user models may contain generation errors, which can quickly propagate into the agent's policy. It is thus important to control the quality of the diversification and resist the noise. In this paper, we propose a novel dialogue diversification method for task-oriented dialogue systems trained in simulators. Our method, Intermittent Short Extension Ensemble (I-SEE), 1 constrains the intensity to interact with an ensemble of diverse user models and effectively controls the quality of the diversification. Evaluations on the Multiwoz dataset show that I-SEE successfully boosts the performance of several state-of-the-art DRL dialogue agents.

show abstract

More Diverse Dialogue Datasets via Diversity-Informed Data Collection

Cited by 14 publications

References 25 publications

Semantic Diversity in Dialogue with Natural Language Inference

Semantic Diversity in Dialogue with Natural Language Inference

Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey

High-Quality Dialogue Diversification by Intermittent Short Extension Ensembles

Contact Info

Product

Resources

About