Get another label? improving data quality and data mining using multiple, noisy labelers

Sheng, Victor S.; Provost, Foster; Ipeirotis, Panagiotis G.

doi:10.1145/1401890.1401965

Cited by 851 publications

(570 citation statements)

References 29 publications

Supporting

Mentioning

555

Contrasting

Unclassified

Order By: Relevance

“…It is common to assume ( [22,23,27]) that, for each task i ∈ T , given the true label y i , z ijk and z ij k are independent for any j, j ∈ A, j = j , k, k ∈ I.…”

Section: Preliminariesmentioning

confidence: 99%

“…Binary classification tasks are prevalent in many application domains, as it is confirmed, for instance, for the case of Amazon Mechanical Turk in [16]. Since the labeling accuracy can greatly vary from worker to worker, requesters often collect different labels for the same instance and, consequently, rely on some aggregation technique in order to increase the quality of the inferred labels [23] and, thus, reduce the classification error.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A workload-dependent task assignment policy for crowdsourcing

et al. 2017

View full text Add to dashboard Cite

Crowdsourcing marketplaces have emerged as an effective tool for high-speed, low-cost labeling of massive data sets. Since the labeling accuracy can greatly vary from worker to worker, we are faced with the problem of assigning labeling tasks to workers so as to maximize the accuracy associated with their answers. In this work, we study the problem of assigning workers to tasks under the assumption that workers' reliability could change depending on their workload, as a result of, e.g., fatigue and learning. We offer empirical evidence of the existence of a workload-dependent accuracy variation among workers, and propose solution procedures for our Crowdsourced Labeling Task Assignment Problem, which we validate on both synthetic and real data sets.

show abstract

“…It is common to assume ( [22,23,27]) that, for each task i ∈ T , given the true label y i , z ijk and z ij k are independent for any j, j ∈ A, j = j , k, k ∈ I.…”

Section: Preliminariesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A workload-dependent task assignment policy for crowdsourcing

et al. 2017

View full text Add to dashboard Cite

show abstract

“…The work presented in this paper is an extension to a previously published conference paper (Sheng et al, 2008). In the present paper, we present and evaluate two additional algorithms for selectively allocating labeling effort (NLU and NLMU; see Sections 6.3 and 6.4).…”

Section: Related Workmentioning

confidence: 99%

Repeated labeling using multiple noisy labelers

Ipeirotis

Provost

Sheng

et al. 2013

Data Min Knowl Disc

Self Cite

144

111

View full text Add to dashboard Cite

This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect.We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always.(ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage.(iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a set of robust techniques that combine different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.

show abstract

“…In particular, Amazon Mechanical Turk has become a popular resource for non-expert annotation of linguistic data for use in diverse NLP applications [44][45][46][47], including sentiment analysis [48][49][50]. While our test data was annotated by research assistants, we elected to employ AMT at various stages of lexicon development and for generating additional training data.…”

Section: Related Workmentioning

confidence: 99%

Building Subjectivity Lexicon(s) from Scratch for Essay Data

Klebanov

Burstein

Madnani

et al. 2012

Computational Linguistics and Intelligent Text Processing

View full text Add to dashboard Cite

Abstract.While there are a number of subjectivity lexicons available for research purposes, none can be used commercially. We describe the process of constructing subjectivity lexicon(s) for recognizing sentiment polarity in essays written by test-takers, to be used within a commercial essay-scoring system. We discuss ways of expanding a manually-built seed lexicon using dictionary-based, distributional indomain and out-of-domain information, as well as using Amazon Mechanical Turk to help "clean up" the expansions. We show the feasibility of constructing a family of subjectivity lexicons from scratch using a combination of methods to attain competitive performance with state-of-art research-only lexicons. Furthermore, this is the first use, to our knowledge, of a paraphrase generation system for expanding a subjectivity lexicon.

show abstract

Get another label? improving data quality and data mining using multiple, noisy labelers

Cited by 851 publications

References 29 publications

A workload-dependent task assignment policy for crowdsourcing

A workload-dependent task assignment policy for crowdsourcing

Repeated labeling using multiple noisy labelers

Building Subjectivity Lexicon(s) from Scratch for Essay Data

Contact Info

Product

Resources

About