The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2008
DOI: 10.1145/1401890.1401965
|View full text |Cite
|
Sign up to set email alerts
|

Get another label? improving data quality and data mining using multiple, noisy labelers

Abstract: This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

10
555
0
5

Year Published

2010
2010
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 851 publications
(570 citation statements)
references
References 29 publications
10
555
0
5
Order By: Relevance
“…It is common to assume ( [22,23,27]) that, for each task i ∈ T , given the true label y i , z ijk and z ij k are independent for any j, j ∈ A, j = j , k, k ∈ I.…”
Section: Preliminariesmentioning
confidence: 99%
See 1 more Smart Citation
“…It is common to assume ( [22,23,27]) that, for each task i ∈ T , given the true label y i , z ijk and z ij k are independent for any j, j ∈ A, j = j , k, k ∈ I.…”
Section: Preliminariesmentioning
confidence: 99%
“…Binary classification tasks are prevalent in many application domains, as it is confirmed, for instance, for the case of Amazon Mechanical Turk in [16]. Since the labeling accuracy can greatly vary from worker to worker, requesters often collect different labels for the same instance and, consequently, rely on some aggregation technique in order to increase the quality of the inferred labels [23] and, thus, reduce the classification error.…”
Section: Introductionmentioning
confidence: 99%
“…The work presented in this paper is an extension to a previously published conference paper (Sheng et al, 2008). In the present paper, we present and evaluate two additional algorithms for selectively allocating labeling effort (NLU and NLMU; see Sections 6.3 and 6.4).…”
Section: Related Workmentioning
confidence: 99%
“…In particular, Amazon Mechanical Turk has become a popular resource for non-expert annotation of linguistic data for use in diverse NLP applications [44][45][46][47], including sentiment analysis [48][49][50]. While our test data was annotated by research assistants, we elected to employ AMT at various stages of lexicon development and for generating additional training data.…”
Section: Related Workmentioning
confidence: 99%