Two statistics, kappa and weighted kappa, are available for measuring agreement between two raters on a nominal scale. Formulas for the standard errors of these two statistics have been given in the literature, but they are in error. The errors seem to be in the direction of overestimation, so that the use of the incorrect formulas results in conservative significance tests and confidence intervals. Valid formulas for the approximate large-sample variances are given, and their calculation is illustrated using a numerical example.The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when all disagreements may be considered equally serious, and weighted kappa is appropriate when the relative seriousness of the different possible disagreements can be specified.The papers describing these two statistics also present expressions for their standard errors. These expressions are incorrect, having been derived from the contradictory assumptions of fixed marginal totals and binomial variation of cell frequencies. Everitt (1968) derived the exact variances of weighted and unweighted kappa when the parameters are zero by assuming a generalized hypergeometric distribution. He found these expressions to be far too complicated for routine use, and offered, as alternatives, expressions derived by assuming binomial distributions. These alternative expressions are incorrect, essentially for the same reason as above.Assume that N subjects are distributed into k* cells by each of them being assigned to one of k categories by one rater and, independently, to one of the same k categories by a second
This paper considers the mean and variance of the two statistics, kappa arid weighted kappa, which are useful in measuring agreement between two raters, in the situation where they independently allocate a sample of subjects to a prearranged set of categories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.