Proceedings of the 22nd ACM International Conference on Information &Amp; Knowledge Management 2013
DOI: 10.1145/2505515.2507884
|View full text |Cite
|
Sign up to set email alerts
|

An analysis of crowd workers mistakes for specific and complex relevance assessment task

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
3
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 4 publications
1
3
0
Order By: Relevance
“…We found no correlation between judging time and topic or document length (measured as number of chars or words, or with the ari readability index (Senter and Smith 1967)). This is consistent with the findings of (Anderton et al 2013) who looked at these dimensions within the TREC 2012 Crowdsourcing track collection. We also did not observe correlation of time spent with relevance level, nor with agreement rates with Sormunen and TREC judgments.…”
Section: Resultssupporting
confidence: 90%
See 1 more Smart Citation
“…We found no correlation between judging time and topic or document length (measured as number of chars or words, or with the ari readability index (Senter and Smith 1967)). This is consistent with the findings of (Anderton et al 2013) who looked at these dimensions within the TREC 2012 Crowdsourcing track collection. We also did not observe correlation of time spent with relevance level, nor with agreement rates with Sormunen and TREC judgments.…”
Section: Resultssupporting
confidence: 90%
“…In our work we investigate the feasibility of this approach in terms of individual and aggregated judgment quality. (Anderton et al 2013) look at mistakes by crowd workers performing relevance judgments, comparing to trained TREC assessors. They show that very short time spent to make the judgment leads to worse quality judgments and that document length has little impact on the time spent to make a judgment.…”
Section: Relevance Judgments and Crowdsourcingmentioning
confidence: 99%
“…For example Alonso and Mizzaro [2012] compared relevance judgments gathered by the crowd vs. more experienced TREC assessors, finding comparable accuracy. Others have attempted to measure relevance in various ways, including graded judgments [McCreadie et al 2011], preferencebased judgments [Anderton et al 2013], and multidimensional ones [Zhang et al 2014]. Collecting relevance labels through crowdsourcing has also been used in practice, for example in the TREC blog track [McCreadie et al 2011], or in the judgment task of the TREC Crowdsourcing Track [Smucker et al 2014].…”
Section: Crowdsourcingmentioning
confidence: 99%
“…As Rogstadius et al [2011] show, the accuracy of outputs typically decreases with increasing task complexity. From a design perspective, it is further important to implement tasks in a way that limits cognitive complexity; for instance, comparing two objects is easier than identifying features of individual objects [Anderton et al 2013]. While the simplification of more complex tasks introduces longer completion times, it leads to higher quality; simpler tasks suit better if workers perform with interruptions [Cheng et al 2015b].…”
Section: Improve Task Designmentioning
confidence: 99%