Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.446
|View full text |Cite
|
Sign up to set email alerts
|

More Diverse Dialogue Datasets via Diversity-Informed Data Collection

Abstract: Automated generation of conversational dialogue using modern neural architectures has made notable advances. However, these models are known to have a drawback of often producing uninteresting, predictable responses; this is known as the diversity problem. We introduce a new strategy to address this problem, called Diversity-Informed Data Collection.Unlike prior approaches, which modify model architectures to solve the problem, this method uses dynamically computed corpuslevel statistics to determine which con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 25 publications
0
7
0
Order By: Relevance
“…Other past work has incorporated diversity metrics into the dialogue dataset creation pipeline. Stasaski et al (2020) propose a method which measures the diversity of a crowdworker's contributions compared to a corpus, using that information to determine when to stop collecting data from the worker. This results in a more diverse dataset.…”
Section: Diversity Metricsmentioning
confidence: 99%
“…Other past work has incorporated diversity metrics into the dialogue dataset creation pipeline. Stasaski et al (2020) propose a method which measures the diversity of a crowdworker's contributions compared to a corpus, using that information to determine when to stop collecting data from the worker. This results in a more diverse dataset.…”
Section: Diversity Metricsmentioning
confidence: 99%
“…Augmented with human feedback data, Gao et al (2020b) proposed that the generated responses could be reranked via a response ranking framework trained on the human feedback data and responses with higher quality including diversity were selected. Stasaski et al (2020) proposed to change the data collection pipeline by iteratively computing the diversity of responses from different human participants in dataset construction and selected those participants who tend to generate informative and diverse responses.…”
Section: Response Diversitymentioning
confidence: 99%
“…Increasing dialogue diversity is a long-lasting research interest. Dialogue diversity can be improved via enforcing diversity objective functions (such as maximize mutual information) in neural models (Li et al, 2016a;Baheti et al, 2018), perturbing language rules (Niu and Bansal, 2019) or environment parameters Ruiz et al, 2019), randomizing trajectory synthesis (Andrychowicz et al, 2017;Lu et al, 2019), selecting more diverse data contributors (Stasaski et al, 2020), and sampling trajectories from a diverse set of environments (Chua et al, 2018;Janner et al, 2019). For instance, Campagna et al augmented dialogue data using domain-independent transition rules and domain-specific ontology (Campagna et al, 2020).…”
Section: Diversification In Dialoguesmentioning
confidence: 99%
“…Although the uses are slightly different, ideas to improve diversification can be universal. Dialogue diversity can be improved via i) enforcing diversity in objective functions (such as maximize mutual information) of neural models (Li et al, 2016a;Baheti et al, 2018), ii) perturbing language rules (Niu and Bansal, 2019) or environment parameters Ruiz et al, 2019), iii) randomizing trajectory synthesis (Andrychowicz et al, 2017;Lu et al, 2019), iv) selecting more diverse data contributors (Stasaski et al, 2020), and v) sampling trajectories from a diverse set of environments (Chua et al, 2018;Janner et al, 2019).…”
Section: Introductionmentioning
confidence: 99%