2012
DOI: 10.20982/tqmp.08.1.p023
|View full text |Cite|
|
Sign up to set email alerts
|

Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

Abstract: Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

29
2,175
6
37

Year Published

2014
2014
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 3,002 publications
(2,266 citation statements)
references
References 19 publications
(52 reference statements)
29
2,175
6
37
Order By: Relevance
“…Intrarater considered single measures whereas interrater considered average measures, both seeking absolute agreement [19,42]. The intra-and interrater anatomical observations were in strong to near-perfect agreement (0.847 B ICC Obs1 B 0.987; 0.867 B ICC Obs2 B 0.967; 0.703 B ICC Obs1-2 B 0.886; Table 1).…”
Section: Methodsmentioning
confidence: 91%
“…Intrarater considered single measures whereas interrater considered average measures, both seeking absolute agreement [19,42]. The intra-and interrater anatomical observations were in strong to near-perfect agreement (0.847 B ICC Obs1 B 0.987; 0.867 B ICC Obs2 B 0.967; 0.703 B ICC Obs1-2 B 0.886; Table 1).…”
Section: Methodsmentioning
confidence: 91%
“…Then, two coders (MEW and RSX) independently coded each response as fitting one of more of these themes (i.e., each response was coded as fitting or not fitting each of the themes). We calculated the percent agreement on coding and kappas for the coding of each theme [31]. For the survey responses with discrepantly coded themes, the coders reached consensus through discussion.…”
Section: Resultsmentioning
confidence: 99%
“…Observed repetitions from both sets of data collection forms were extracted into SPSS to analyse inter-rater reliability (IRR). As number of repetitions were ordinal level data the interclass correlation coefficient was used to assess IRR as advised by Hallgren 31 . An ICC of 0.968 indicated substantial reliability between raters.…”
Section: Data Collectionmentioning
confidence: 99%