Abstract:Crowdsourcing platforms are commonly used for research in the humanities, social sciences and informatics, including the use of crowdworkers to annotate textual material or visuals. Utilizing two empirical studies, this article systematically assesses the potential of crowdcoding for less manifest contents of news texts, here focusing on political actor evaluations. Specifically, Study 1 compares the reliability and validity of crowdcoded data to that of manual content analyses; Study 2 proceeds to investigate… Show more
“…Given that coders are treated as interchangeable, any (potentially) remaining coder idiosyncrasies (either coder-specific systematic errors or random measurement errors) are in effect no longer considered, neither in the analyses nor in the interpretations of the findings (see , for a detailed discussion on this issue). When there is a sufficiently large number of coders, or each materials are coded by multiple coders ("duplicated coding" as in some SML applications or in crowdcoding: see Lind, Gruber, & Boomgaarden, 2017;Scharkow, 2013), the impact of coder idiosyncrasiesespecially random errorswould diminish, as they will cancel each other out as long as the number of coders/ duplicated coding instances increases. Nevertheless, remaining systematic errors in coder idiosyncrasies may still introduce bias in gold standard materials with respect to the target of inference, especially for data with a higher level of intercoder reliability (i.e., a systematic deviation from the true target).…”
Section: Design and Setup Of Monte Carlo Simulationsmentioning
Political communication has become one of the central arenas of innovation in the application of automated analysis approaches to ever-growing quantities of digitized texts. However, although researchers routinely and conveniently resort to certain forms of human coding to validate the results derived from automated procedures, in practice the actual "quality assurance" of such a "gold standard" often goes unchecked. Contemporary practices of validation via manual annotations are far from being acknowledged as best practices in the literature, and the reporting and interpretation of validation procedures differ greatly. We systematically assess the connection between the quality of human judgment in manual annotations and the relative performance evaluations of automated procedures against true standards by relying on large-scale Monte Carlo simulations. The results from the simulations confirm that there is a substantially greater risk of a researcher reaching an incorrect conclusion regarding the performance of automated procedures when the quality of manual annotations used for validation is not properly ensured. Our contribution should therefore be regarded as a call for the systematic application of high-quality manual validation materials in any political communication study, drawing on automated text analysis procedures.
“…Given that coders are treated as interchangeable, any (potentially) remaining coder idiosyncrasies (either coder-specific systematic errors or random measurement errors) are in effect no longer considered, neither in the analyses nor in the interpretations of the findings (see , for a detailed discussion on this issue). When there is a sufficiently large number of coders, or each materials are coded by multiple coders ("duplicated coding" as in some SML applications or in crowdcoding: see Lind, Gruber, & Boomgaarden, 2017;Scharkow, 2013), the impact of coder idiosyncrasiesespecially random errorswould diminish, as they will cancel each other out as long as the number of coders/ duplicated coding instances increases. Nevertheless, remaining systematic errors in coder idiosyncrasies may still introduce bias in gold standard materials with respect to the target of inference, especially for data with a higher level of intercoder reliability (i.e., a systematic deviation from the true target).…”
Section: Design and Setup Of Monte Carlo Simulationsmentioning
Political communication has become one of the central arenas of innovation in the application of automated analysis approaches to ever-growing quantities of digitized texts. However, although researchers routinely and conveniently resort to certain forms of human coding to validate the results derived from automated procedures, in practice the actual "quality assurance" of such a "gold standard" often goes unchecked. Contemporary practices of validation via manual annotations are far from being acknowledged as best practices in the literature, and the reporting and interpretation of validation procedures differ greatly. We systematically assess the connection between the quality of human judgment in manual annotations and the relative performance evaluations of automated procedures against true standards by relying on large-scale Monte Carlo simulations. The results from the simulations confirm that there is a substantially greater risk of a researcher reaching an incorrect conclusion regarding the performance of automated procedures when the quality of manual annotations used for validation is not properly ensured. Our contribution should therefore be regarded as a call for the systematic application of high-quality manual validation materials in any political communication study, drawing on automated text analysis procedures.
“…Crowd-coding is both hailed as a useful strategy but also viewed critically (Snow et al 2008, Benoit et al 2016, Lind et al 2017, Dreyfuss 2018. Because Krippendorf's alpha was not higher for certain categories we carried out additonal analyses to see whether our results remain robust to the exclusion of certain workers.…”
Section: Crowd-coding Of Open-ended Responsesmentioning
Michael S. Moore is among the most prominent normative theorists to argue that retributive justice, understood as the deserved suffering of offenders, justifies punishment. Moore claims that the principle of retributive justice is pervasively supported by our judgments of justice and sufficient to ground punishment. We offer an experimental assessment of these two claims, (1) the pervasiveness claim, according to which people are widely prone to endorse retributive judgments, and (2) the sufficiency claim, according to which no non-retributive principle is necessary for justifying punishment. We test these two claims in a survey and a related survey experiment in which we present participants (N =~900) with the stylized description of a criminal case. Our results seem to invalidate claim (1) and provide mixed results concerning claim (2). We conclude that retributive justice theories which advance either of these two claims need to reassess their evidential support. Address: University Mannheim, MZES, A5 6, 68159 Mannheim. Data and RMarkdown code to fully reproduce the study are available upon request and will be stored online in the Harvard dataverse upon publication.
“…This simple but powerful idea that good collective decisions can emanate from various averaged independent judgements of non-experts is long discussed in academia, business and popular science (see Surowiecki 2004;Lehman & Zobel 2017). Yet, notwithstanding instructive earlier studies with positive conclusions regarding the validity of crowd-coded data (e.g., Berinsky et al 2014;Haselmayer & Jenny 2016;Lind et al 2017), it seems fair to say that crowd-coding is only starting to gain traction in political science at large since Benoit et al (2016) have convincingly argued that the results of expert judgements -still considered the gold standard by many (e.g., when it comes to the location of parties) -can be matched with crowd-coding, at least for simple coding tasks. This is significant, since experts are expensive and in short supply and automated (coding) methods are not yet good enough at extracting meaning (Benoit et al 2016: 280).…”
Section: Introductionmentioning
confidence: 96%
“…; Haselmayer & Jenny ; Lind et al. ), it seems fair to say that crowd‐coding is only starting to gain traction in political science at large since Benoit et al. () have convincingly argued that the results of expert judgements – still considered the gold standard by many (e.g., when it comes to the location of parties) – can be matched with crowd‐coding, at least for simple coding tasks.…”
Section: Introductionmentioning
confidence: 99%
“…This averaged confidence can also be conceived of as inter‐coder reliability within the crowd (see Lind et al. ). Everyone can use the above formula to arrive at the confidence score for a given statement, so it is very transparent.…”
Crowd‐coding is a novel technique that allows for fast, affordable and reproducible online categorisation of large numbers of statements. It combines judgements by multiple, paid, non‐expert coders to avoid miscoding(s). It has been argued that crowd‐coding could replace expert judgements, using the coding of political texts as an example in which both strategies produce similar results. Since crowd‐coding yields the potential to extend the replication standard to data production and to ‘scale’ coding schemes based on a modest number of carefully devised test questions and answers, it is important that its possibilities and limitations are better understood. While previous results for low complexity coding tasks are encouraging, this study assesses whether and under what conditions simple and complex coding tasks can be outsourced to the crowd without sacrificing content validity in return for scalability. The simple task is to decide whether a party statement counts as positive reference to a concept – in this case: equality. The complex task is to distinguish between five concepts of equality. To account for the crowd‐coder's contextual knowledge, the IP restrictions are varied. The basis for comparisons is 1,404 party statements, coded by experts and the crowd (resulting in 30,000 online judgements). Comparisons of the expert‐crowd match at the level of statements and party manifestos show that the results are substantively similar even for the complex task, suggesting that complex category schemes can be scaled via crowd‐coding. The match is only slightly higher when IP restrictions are used as an approximation of coder expertise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.