Crowdsourcing Disagreement for Collecting Semantic Annotation

Dumitrache, Anca

doi:10.1007/978-3-319-18818-8_43

Cited by 16 publications

(15 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Krippendorff (2011) argued that there are at least two types of disagreement in content coding: random variation, which comes as an unavoidable by-product of human coding, and systematic disagreement, which is influenced by features of the data or annotators. Dumitrache (2015) identifies different sources of disagreement as (a) the clarity of an annotation label (i.e., task descriptions), (b) the ambiguity of the text, and (c) differences in workers. Aroyo and Welty (2013) also studied inter-annotator disagreement in association with features of the input, showing that it reflects semantic ambiguity of the training instances.…”

Section: Annotation Disagreementmentioning

confidence: 99%

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations

Davani

Díaz

Prabhakaran

2022

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Majority voting and averaging are common approaches used to resolve annotator disagreements and derive single ground truth labels from multiple annotations. However, annotators may systematically disagree with one another, often reflecting their individual biases and values, especially in the case of subjective tasks such as detecting affect, aggression, and hate speech. Annotator disagreements may capture important nuances in such tasks that are often ignored while aggregating annotations to a single ground truth. In order to address this, we investigate the efficacy of multi-annotator models. In particular, our multi-task based approach treats predicting each annotators’ judgements as separate subtasks, while sharing a common learned representation of the task. We show that this approach yields same or better performance than aggregating labels in the data prior to training across seven different binary classification tasks. Our approach also provides a way to estimate uncertainty in predictions, which we demonstrate better correlate with annotation disagreements than traditional methods. Being able to model uncertainty is especially useful in deployment scenarios where knowing when not to make a prediction is important.

show abstract

Section: Annotation Disagreementmentioning

confidence: 99%

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations

Davani

Díaz

Prabhakaran

2022

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Dumitrache [12] analyzes three sources of disagreement in crowdsourced annotation tasks by tying them to Knowlton's "triangle of reference" [21], composed of 'sign', 'referent', and 'conception'. These points map respectively to (a) the clarity of an annotation label, (b) the ambiguity of the text, and (c) differences in workers.…”

Section: Sources Of Crowdworker Disagreementmentioning

confidence: 99%

“…Wiebe et al describes an iterative process in which annotators are presented with their original and bias-corrected annotations and given the opportunity to provide feedback [41]. The Crowd Truth system [4,12] illustrates worker agreement or disagreement on individual items by showing the distribution of assigned labels in a color-coded table. This system also provides quantitative metrics for assessing the clarity of specific sentences and labels.…”

Section: Surfacing Worker Disagreementmentioning

confidence: 99%

“…This also highlights one of the difficulties of relying on worker screening to achieve consistency -a subgroup with the highest agreement may still differ substantially from the expert labels. Using Dumitrache's mapping [12], we can align the four themes identified in the prior section to the three points of Knowlton's triangle of reference [21]. Ambiguities of 'sign' are reflected in the theme of Label Concept Overlap, where labels differed because annotators maintained different notions of what the labels signified.…”

Section: Theoretical Observationsmentioning

confidence: 99%

See 1 more Smart Citation

Parting Crowds

Kairam

Heer

2016

Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work &Amp; Social Computing

View full text Add to dashboard Cite

Crowdsourcing is a common strategy for collecting the "gold standard" labels required for many natural language applications. Crowdworkers differ in their responses for many reasons, but existing approaches often treat disagreements as "noise" to be removed through filtering or aggregation. In this paper, we introduce the workflow design pattern of crowd parting: separating workers based on shared patterns in responses to a crowdsourcing task. We illustrate this idea using an automated clustering-based method to identify divergent, but valid, worker interpretations in crowdsourced entity annotations collected over two distinct corpora -Wikipedia articles and Tweets. We demonstrate how the intermediatelevel view provide by crowd-parting analysis provides insight into sources of disagreement not easily gleaned from viewing either individual annotation sets or aggregated results. We discuss several concrete applications for how this approach could be applied directly to improving the quality and efficiency of crowdsourced annotation tasks.

show abstract

“…While the availability of such large, human-enriched datasets has been a boon to computer vision research, there is increasing awareness of the human biases that are reflected in crowdsourced data. Dumitrache rejected the notion that there can be a single ground truth in any semantic annotation task, arguing instead for a "disagreement-aware" approach to crowdsourcing [17]. In a similar vein, Chung and colleagues [8] noted the diverse answers often provided by workers, and advocated for reporting statistical distributions of responses, to preserve this diversity.…”

Section: Introductionmentioning

confidence: 99%

It’s About Time: A View of Crowdsourced Data Before and During the Pandemic

Christoforou¹,

Barlas²,

Otterbacher

2021

Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

Data attained through crowdsourcing have an essential role in the development of computer vision algorithms. Crowdsourced data might include reporting biases, since crowdworkers usually describe what is "worth saying" in addition to images' content. We explore how the unprecedented events of 2020, including the unrest surrounding racial discrimination, and the COVID-19 pandemic, might be reflected in responses to an open-ended annotation task on people images, originally executed in 2018 and replicated in 2020. Analyzing themes of Identity and Health conveyed in workers' tags, we find evidence that supports the potential for temporal sensitivity in crowdsourced data. The 2020 data exhibit more race-marking of images depicting non-Whites, as well as an increase in tags describing Weight. We relate our findings to the emerging research on crowdworkers' moods. Furthermore, we discuss the implications of (and suggestions for) designing tasks on proprietary platforms, having demonstrated the possibility for additional, unexpected variation in crowdsourced data due to significant events. CCS CONCEPTS• Information systems → Computing platforms; • Humancentered computing → Empirical studies in HCI ; • Computing methodologies → Artificial intelligence.

show abstract

Crowdsourcing Disagreement for Collecting Semantic Annotation

Cited by 16 publications

References 21 publications

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations

Parting Crowds

It’s About Time: A View of Crowdsourced Data Before and During the Pandemic

Contact Info

Product

Resources

About