Annotator Rationales for Labeling Tasks in Crowdsourcing

Kutlu, Mücahid; McDonnell, Tyler; Lease, Matthew; Elsayed, Tamer

doi:10.1613/jair.1.12012

Cited by 26 publications

(25 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is not a problem if the task is a definite-annotation task, where labels and data are determined in a one-to-one 04-3 correspondence. However, for tasks where the annotation criterion is not explicitly and uniquely defined, it is difficult to gain uniformity among a large number of labels [13,14,15]. In particular, when the target data are highly specialized, such as medical data, the annotation criteria for tasks, such as diagnosing the presence or absence of a lesion, often depend on the knowledge and experience of the annotator [16].…”

Section: Need For Boosting In Making Datasets With Accurate Labels Fo...mentioning

confidence: 99%

“…Given the increasing demand for machine learning in recent years, it is undesirable to pay high costs for annotation work to create large datasets. Moreover, the distributed labor force on the Internet is currently utilized for annotation in many areas [13,14,15,18], as a massive labor force is required for making large, accurately annotated datasets.…”

Section: Need For Boosting In Making Datasets With Accurate Labels Fo...mentioning

confidence: 99%

See 1 more Smart Citation

The potential of waiting one second before answering a question for boosting medical image binary labelling performance

Kagawa¹,

Shirasuna²,

Ikeda³

et al. 2022

Preprint

View full text Add to dashboard Cite

Making rational judgments is not always easy for humans. Given that aggregation of the distributed labor force on the Internet has become common and necessary, a simple and cost-effective solution should be developed to improve the quality of workers’ judgments. We refer to resource rationality and assume that there is an optimal decision time for each worker according to the trade-off between improving their correct answer rate and cognitive load. In this study, we verify whether an enforced decision time boosts job performance by not allowing workers to provide answers within a certain short time after being presented with a task (i.e., forcing workers to think). As experimental materials, we evaluated the judgments of binary medical images in a situation in which large, accurately annotated medical datasets for machine learning were created. First, through a theoretical analysis, we confirmed the existence of an optimal decision time that improved workers’ performance by optimally allocating their cognitive resources. Second, in a total of four large-scale behavioral experiments with physicians (N = 628) and nurses (N = 651), job performances under various enforced decision times were compared, which were improved by enforcing 1.0 s of additional decision time. The additional 1.0 s was suggested to optimize the trade-off between the workers’ improved correct answer rate and cognitive load. Since our proposed boosting enforcing additional decision time can be implemented in a general-purpose and low-cost manner, it has the potential to be applied to a wide variety of situations.

show abstract

Section: Need For Boosting In Making Datasets With Accurate Labels Fo...mentioning

confidence: 99%

Section: Need For Boosting In Making Datasets With Accurate Labels Fo...mentioning

confidence: 99%

The potential of waiting one second before answering a question for boosting medical image binary labelling performance

Kagawa¹,

Shirasuna²,

Ikeda³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Since the advent of crowdsourcing platforms such as MTurk (Buhrmester et al, 2011), the quality assessment of subjective annotations is a much-researched topic (e.g., Nguyen et al, 2016;Kutlu et al, 2020). Two persisting problems, however, are (1) the opacity of the analytical criteria employed by crowd annotators (especially when using disparate chord vocabularies) and (2) the question of how to assess the quality of annotation sets in which many labels do not coincide (for example in the case of diverging analytical granularities, see Subsection 3.2.1).…”

Section: An Alternative Procedures For Verifying Expert Annotationsmentioning

confidence: 99%

The Annotated Mozart Sonatas: Score, Harmony, and Cadence

Hentschel

Neuwirth

Rohrmeier

2021

Transactions of the International Society for Music Information Retrieval

View full text Add to dashboard Cite

This article describes a new expert-labelled dataset featuring harmonic, phrase, and cadence analyses of all piano sonatas by W.A. Mozart. The dataset draws on the DCML standard for harmonic annotation and is being published adopting the FAIR principles of Open Science. The annotations have been verified using a data triangulation procedure which is presented as an alternative approach to handling annotator subjectivity. This procedure is suited for ensuring consistency, within the dataset and beyond, despite the high level of analytical detail afforded by the employed harmonic annotation syntax. The harmony labels also encode contextual information and are therefore suited for investigating music theoretical questions related to tonal harmony and the harmonic makeup of cadences in the classical style. Apart from providing basic statistical analyses characterizing the dataset, its music theoretical potential is illustrated by two preliminary experiments, one on the terminal harmonies of cadences and the other on the relation between performance durations and harmonic density. Furthermore, particular features can be selected to produce more coarse-grained training data, for example for chord detection algorithms that require less analytical detail. Facilitating the dataset's reusability, it comes with a Python script that allows researchers to easily access various representations of the data tailored to their particular needs.

show abstract

“…The concept tag serves multiple purposes. First, it acts as a rationale (Kutlu et al, 2020;McDonnell et al, 2016), requiring workers to justify their answers and thus nudging them towards high-quality selections. Rationales also provide a form of transparency to help requesters better understand worker intent.…”

Section: Stage 1: Finding Ambiguous Examplesmentioning

confidence: 99%

In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers

Pradhan¹,

Schaekermann²,

Lease³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a novel three-stage FIND-RESOLVE-LABEL workflow for crowdsourced annotation to reduce ambiguity in task instructions and thus improve annotation quality. Stage 1 (FIND) asks the crowd to find examples whose correct label seems ambiguous given task instructions. Workers are also asked to provide a short tag which describes the ambiguous concept embodied by the specific instance found. We compare collaborative vs. non-collaborative designs for this stage. In Stage 2 (RESOLVE), the requester selects one or more of these ambiguous examples to label (resolving ambiguity). The new label(s) are automatically injected back into task instructions in order to improve clarity. Finally, in Stage 3 (LABEL), workers perform the actual annotation using the revised guidelines with clarifying examples. We compare three designs for using these examples: examples only, tags only, or both. We report image labeling experiments over six task designs using Amazon's Mechanical Turk. Results show improved annotation accuracy and further insights regarding effective design for crowdsourced annotation tasks.

show abstract

Annotator Rationales for Labeling Tasks in Crowdsourcing

Cited by 26 publications

References 69 publications

The potential of waiting one second before answering a question for boosting medical image binary labelling performance

The potential of waiting one second before answering a question for boosting medical image binary labelling performance

The Annotated Mozart Sonatas: Score, Harmony, and Cadence

In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers

Contact Info

Product

Resources

About