Under the heading of reliability, most textbooks refer to classical reliability indexes as appropriate measures for determining interrater agreement. It is argued that interrater agreement is a psychometric property that is theoretically different from classical reliability. Interrater agreement indexes measure the degree to which two or more raters agree on the observation of one or more behaviors on one or more subjects and are not theoretically related to classical test theory. A detailed set of formulas is presented to illustrate a set of algebraically equivalent rater agreement indexes that are intended to provide the educational and psychological researcher and practitioner with a practical means of establishing a measure of rater agreement. The formulas are illustrated with a data set. The formulas can be used for dichotomous and continuous data for two or more raters, on one or more subjects, on one or more behaviors. These rater agreement indexes are useful with performance assessments such as observations, porfolios, performance evaluations, essay writing evaluations, authentic assessments, and so on, where multiple facets impact rater agreement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.