1991
DOI: 10.1177/014662169101500101
|View full text |Cite
|
Sign up to set email alerts
|

Reliability of Ratings for Multiple Judges: Intraclass Correlation and Metric Scales

Abstract: Scale-dependent procedures are presented for assessing the reliability of ratings for multiple judges using intraclass correlation. Scale type is defined in terms of admissible transformations, and standardizing transformations for ratio and interval scales are presented to solve the problem of adjusting ratings for "arbitrary scale factors" (unit and/or origin of the scale). The theory of meaningfulness of numerical statements is introduced and the coefficient of relational agreement (Stine, 1989b) is defined… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
3
0
1

Year Published

1993
1993
2014
2014

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(5 citation statements)
references
References 14 publications
1
3
0
1
Order By: Relevance
“…However, it is exactly due to this that this practice has been criticised in some studies as lacking reliability (see, for example, Boud, 1986;Swanson et al, 1991). A study by Falchikov & Magin (1997, p. 386) suggests that low rater reliability can be overcome with the use of multiple ratings and this claim is in line with other studies (see, for example, Fagot, 1991;Houston et al, 1991;Magin, 1993). In another study, Falchikov (1986) has shown that devolving the assessment of group processes to peers can be carried out with a reasonable degree of reliability although peer-teacher correlational analysis is obviously not possible in situations where only students give assessments.…”
Section: Peer Assessment Of Group Worksupporting
confidence: 68%
“…However, it is exactly due to this that this practice has been criticised in some studies as lacking reliability (see, for example, Boud, 1986;Swanson et al, 1991). A study by Falchikov & Magin (1997, p. 386) suggests that low rater reliability can be overcome with the use of multiple ratings and this claim is in line with other studies (see, for example, Fagot, 1991;Houston et al, 1991;Magin, 1993). In another study, Falchikov (1986) has shown that devolving the assessment of group processes to peers can be carried out with a reasonable degree of reliability although peer-teacher correlational analysis is obviously not possible in situations where only students give assessments.…”
Section: Peer Assessment Of Group Worksupporting
confidence: 68%
“…A very large number of assessors appears to produce marks that resemble those of the teacher less well than marks produced by a smaller number of raters or singletons. We were surprised to find that singletons performed as well as larger groups of students, given that it is generally acknowledged that multiple ratings are superior to single ones (e.g., Cox, 1967;Fagot, 1991). It has been argued that the use of multiple raters tends to improve reliability by increasing the ratio of true score variance to error variance (e.g., Ferguson, 1966).…”
Section: Discussionmentioning
confidence: 94%
“…Furthermore, although meta‐analyses have revealed that the criterion‐related validity coefficient of peer assessment can reach as high as 0.69, validity coefficients can differ significantly between studies (Falchikov & Goldfinch, 2000). In addition, although the validity will theoretically increase with the number of assessors (Fagot, 1991; Houston, Raymond, & Svec, 1991), in practice this assumption is often limited. For example, the meta‐analysis of Falchikov and Goldfinch (2000) revealed that the criterion‐related validity obtained by a single assessor is not necessarily lower than that obtained by multiple assessors.…”
Section: Introductionmentioning
confidence: 99%