2017
DOI: 10.3389/fpsyg.2017.00777
|View full text |Cite
|
Sign up to set email alerts
|

An Overview of Interrater Agreement on Likert Scales for Researchers and Practitioners

Abstract: Applications of interrater agreement (IRA) statistics for Likert scales are plentiful in research and practice. IRA may be implicated in job analysis, performance appraisal, panel interviews, and any other approach to gathering systematic observations. Any rating system involving subject-matter experts can also benefit from IRA as a measure of consensus. Further, IRA is fundamental to aggregation in multilevel research, which is becoming increasingly common in order to address nesting. Although, several techni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
60
0
3

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 78 publications
(66 citation statements)
references
References 57 publications
(139 reference statements)
0
60
0
3
Order By: Relevance
“…**See Myszkowski and Storme (this issue) for a discussion of limitations and alternative test options.Though the tests may seemingly produce roughly similar results, there is not always evidence that this has been checked, and the reasons for choosing one test over another, in the absence of explicit justification, can seem arbitrary or based on convention alone. Likewise, the use of other measures to calculate interrater agreement (IRA) popular in other fields of study, such as Finn's rwg, raises similar issues(O'Neill, 2017). While rwg is not a measure commonly used in CAT studies, it has been used by some (e.g.,Wigert, Reiter-Palmon,…”
mentioning
confidence: 99%
“…**See Myszkowski and Storme (this issue) for a discussion of limitations and alternative test options.Though the tests may seemingly produce roughly similar results, there is not always evidence that this has been checked, and the reasons for choosing one test over another, in the absence of explicit justification, can seem arbitrary or based on convention alone. Likewise, the use of other measures to calculate interrater agreement (IRA) popular in other fields of study, such as Finn's rwg, raises similar issues(O'Neill, 2017). While rwg is not a measure commonly used in CAT studies, it has been used by some (e.g.,Wigert, Reiter-Palmon,…”
mentioning
confidence: 99%
“…The exploratory and confirmatory factorial analyses confirmed the mono‐dimensionality of the scale by reporting excellent saturations and model fit (Hair et al, ). As far as the correlations with other measures are concerned, the analyses allowed the aggregation of the measures surveyed at the centre level, with some caution in the case of the cynicism scale, for which the rwg value of 0.40 suggested only weak agreement among the employees (O'Neill, ). Then, the significant correlations with all the study variables in the expected direction confirmed the construct validity of the ELS in its factorial, convergent, discriminant and nomological facets.…”
Section: Discussionmentioning
confidence: 99%
“…Also, users might have had different interpretations of the items, as some IRA indices pointed to low or no agreement within 'engagement' items (MyFitnessPal and MyDietDiary), 'information' (MyDietDiary, MyPlate, and SparkPeople) and 'subjective quality' (Lark, MyFitnessPal, and MyDietDiary). While inter-rater agreement does not imply reliability [61,62], low agreement suggests a large degree of subjectivity in evaluating the apps. Furthermore, some users might have not understood some items, especially in the 'information' domain, which registered the highest non-response rates (22-39%).…”
Section: Discussionmentioning
confidence: 99%
“…For feasibility reasons (budget and time constraints), we could not ask all users to evaluate each app, hence allowing us to calculate inter-rater reliability. To overcome this limitation, we employed methodological adjustments (i.e., IRA estimates [59,61] and response-based weighted means [68]) to ensure the robustness of the responses obtained from the employees recruited in this study. Also, users evaluated the free version of the apps.…”
Section: Limitationsmentioning
confidence: 99%
See 1 more Smart Citation