Abstract:This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the da… Show more
“…It is common to assume ( [22,23,27]) that, for each task i ∈ T , given the true label y i , z ijk and z ij k are independent for any j, j ∈ A, j = j , k, k ∈ I.…”
Section: Preliminariesmentioning
confidence: 99%
“…Binary classification tasks are prevalent in many application domains, as it is confirmed, for instance, for the case of Amazon Mechanical Turk in [16]. Since the labeling accuracy can greatly vary from worker to worker, requesters often collect different labels for the same instance and, consequently, rely on some aggregation technique in order to increase the quality of the inferred labels [23] and, thus, reduce the classification error.…”
Crowdsourcing marketplaces have emerged as an effective tool for high-speed, low-cost labeling of massive data sets. Since the labeling accuracy can greatly vary from worker to worker, we are faced with the problem of assigning labeling tasks to workers so as to maximize the accuracy associated with their answers. In this work, we study the problem of assigning workers to tasks under the assumption that workers' reliability could change depending on their workload, as a result of, e.g., fatigue and learning. We offer empirical evidence of the existence of a workload-dependent accuracy variation among workers, and propose solution procedures for our Crowdsourced Labeling Task Assignment Problem, which we validate on both synthetic and real data sets.
“…It is common to assume ( [22,23,27]) that, for each task i ∈ T , given the true label y i , z ijk and z ij k are independent for any j, j ∈ A, j = j , k, k ∈ I.…”
Section: Preliminariesmentioning
confidence: 99%
“…Binary classification tasks are prevalent in many application domains, as it is confirmed, for instance, for the case of Amazon Mechanical Turk in [16]. Since the labeling accuracy can greatly vary from worker to worker, requesters often collect different labels for the same instance and, consequently, rely on some aggregation technique in order to increase the quality of the inferred labels [23] and, thus, reduce the classification error.…”
Crowdsourcing marketplaces have emerged as an effective tool for high-speed, low-cost labeling of massive data sets. Since the labeling accuracy can greatly vary from worker to worker, we are faced with the problem of assigning labeling tasks to workers so as to maximize the accuracy associated with their answers. In this work, we study the problem of assigning workers to tasks under the assumption that workers' reliability could change depending on their workload, as a result of, e.g., fatigue and learning. We offer empirical evidence of the existence of a workload-dependent accuracy variation among workers, and propose solution procedures for our Crowdsourced Labeling Task Assignment Problem, which we validate on both synthetic and real data sets.
“…The work presented in this paper is an extension to a previously published conference paper (Sheng et al, 2008). In the present paper, we present and evaluate two additional algorithms for selectively allocating labeling effort (NLU and NLMU; see Sections 6.3 and 6.4).…”
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect.We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always.(ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage.(iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a set of robust techniques that combine different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.
“…In particular, Amazon Mechanical Turk has become a popular resource for non-expert annotation of linguistic data for use in diverse NLP applications [44][45][46][47], including sentiment analysis [48][49][50]. While our test data was annotated by research assistants, we elected to employ AMT at various stages of lexicon development and for generating additional training data.…”
Abstract.While there are a number of subjectivity lexicons available for research purposes, none can be used commercially. We describe the process of constructing subjectivity lexicon(s) for recognizing sentiment polarity in essays written by test-takers, to be used within a commercial essay-scoring system. We discuss ways of expanding a manually-built seed lexicon using dictionary-based, distributional indomain and out-of-domain information, as well as using Amazon Mechanical Turk to help "clean up" the expansions. We show the feasibility of constructing a family of subjectivity lexicons from scratch using a combination of methods to attain competitive performance with state-of-art research-only lexicons. Furthermore, this is the first use, to our knowledge, of a paraphrase generation system for expanding a subjectivity lexicon.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.