2016
DOI: 10.1017/s0003055416000058
|View full text |Cite
|
Sign up to set email alerts
|

Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

Abstract: Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of non-experts, we generate results comparable to those… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

7
175
0
2

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
3
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 217 publications
(185 citation statements)
references
References 45 publications
7
175
0
2
Order By: Relevance
“…These parameters represent the degree to which an expert diverges from other experts who code the same cases. This operaionalization aligns with classic definitions of reliability (Carmines and Zeller, 1979), as well as recent empirical work examining convergence among workers on crowd-sourcing platforms when coding the same cases (Benoit et al, 2016;Marquardt et al, 2017). As potential correlates of reliability, we use both demographic data from a post-survey questionnaire and the coding characteristics of experts.…”
mentioning
confidence: 84%
“…These parameters represent the degree to which an expert diverges from other experts who code the same cases. This operaionalization aligns with classic definitions of reliability (Carmines and Zeller, 1979), as well as recent empirical work examining convergence among workers on crowd-sourcing platforms when coding the same cases (Benoit et al, 2016;Marquardt et al, 2017). As potential correlates of reliability, we use both demographic data from a post-survey questionnaire and the coding characteristics of experts.…”
mentioning
confidence: 84%
“…The data generation process can be potentially very quick, even for larger amounts of data, and it likely comes at considerably lower costs than a traditional manual approach. Furthermore, crowdcoded content analysis data may potentially be more reliable and easier to replicate (Benoit, Conway, Lauderdale, Laver, & Mikhaylov, 2016). While the discipline has rather standardized procedures for manual content analysis, such are lacking for crowdsourced content analysis.…”
mentioning
confidence: 99%
“…With many coders and a reasonably optimistic assumption on their individual accuracy, however, the majority is able to quite reliably select the correct category (see Figure 2). This is why crowd-sourcing approaches to content analysis can produce data of acceptable quality from multiple codings per unit by minimally trained coders (Benoit, Conway, Lauderdale, Laver, & Mikhaylov, 2016). In the end, whether or not a researcher is willing to trust a majority standard depends on his or her assumptions about the accuracy of the single coders.…”
Section: Approximations Of the Misclassification Matrixmentioning
confidence: 99%