Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1204
|View full text |Cite
|
Sign up to set email alerts
|

Finding Patterns in Noisy Crowds: Regression-based Annotation Aggregation for Crowdsourced Data

Abstract: Crowdsourcing offers a convenient means of obtaining labeled data quickly and inexpensively. However, crowdsourced labels are often noisier than expert-annotated data, making it difficult to aggregate them meaningfully. We present an aggregation approach that learns a regression model from crowdsourced annotations to predict aggregated labels for instances that have no expert adjudications. The predicted labels achieve a correlation of 0.594 with expert labels on our data, outperforming the best alternative ag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 7 publications
0
9
0
Order By: Relevance
“…A range of options have been developed for improving crowdsourcing quality. The most common approach is to collect multiple annotations and then aggregate them (Hovy et al, 2013;Passonneau and Carpenter, 2014;Parde and Nielsen, 2017;Dumitrache et al, 2018). This can identify inconsistencies, but at significant cost as each example must be annotated multiple times.…”
Section: Crowdsourcing Qualitymentioning
confidence: 99%
See 1 more Smart Citation
“…A range of options have been developed for improving crowdsourcing quality. The most common approach is to collect multiple annotations and then aggregate them (Hovy et al, 2013;Passonneau and Carpenter, 2014;Parde and Nielsen, 2017;Dumitrache et al, 2018). This can identify inconsistencies, but at significant cost as each example must be annotated multiple times.…”
Section: Crowdsourcing Qualitymentioning
confidence: 99%
“…Training on data with these issues will lead to lower quality models, which in turn decrease the effectiveness of the overall dialog system. Most research on improving data quality has focused on mechanisms such as aggregation (Parde and Nielsen, 2017), worker filtering (Li and Liu, 2015), and attention checks (Oppenheimer et al, 2009). These all raise costs and primarily address clear inconsistencies (such as in examples 1, 4, 5, and 6) but not more subtle cases like the inclusion of "dollar" in examples 2 and 3.…”
Section: Introductionmentioning
confidence: 99%
“…We crowdsource advice annotations from Amazon Mechanical Turk. Despite the inherent noise due to crowdsourcing (Parde and Nielsen, 2017), recent work showed that when designed carefully, aggregated crowdsourced annotations are trustworthy even for complex tasks (Nye et al, 2018).…”
Section: Annotation Taskmentioning
confidence: 99%
“…We collected gold standard metaphor novelty scores for these word pairs in the same manner by which we built our previous VUAMC-based metaphor novelty dataset (Parde and Nielsen, 2018a), used to train the metaphor novelty prediction model in this work. Specifically, we crowdsourced five annotations for each word pair, and automatically aggregated them to continuous scores using a label aggregation model learned from features based on annotation distribution and presumed worker trustworthiness (Parde and Nielsen, 2017). There were two statistically significant differences between the two groups: questions about false positives were rated as clearer than questions about true positives, and questions about true positives were rated as having more depth than questions about false positives.…”
Section: Average Ratings For Question Subgroupsmentioning
confidence: 99%