Interrater correlations are widely interpreted as estimates of the reliability of supervisoly performance ratings, and are frequently used to correct the correlations between ratings and other measures (e.g., test scores) for attenuation. These interrater correlations do provide some useful information, but they are not reliability coefficients. There is clear evidence of systematic rater effects in performance appraisal, and variance associated with raters is not a source of random measurement error. We use generalizability theoly to show why rater variance is not properly interpreted as measurement error, and show how such systematic rater effects can influence both reliability estimates and validity coefficients. We show conditions under which interrater correlations can either overestimate or underestimate reliability coefficients, and discuss reasons other than random measurement error for low interrater correlations.Measurement error can seriously obscure the effects of treatments and interventions, and attenuate the correlations between measures. Corrections for attenuation have long been available, and are widely used in applied research. However, although the idea of correcting for the known effects of measurement error is simple and uncontroversial, the operational process of correcting for attenuation can be a difficult and confusing exercise (Cronbach, Gleser, Nanda, & Rajaratnam, 1972;DeShon, 1998;Lumsden, 1976). A number of methods exist for estimating the reliability of most tests or measures, and choice of the appropriate reliability estimate is often a complex and important matter (Schmidt & Hunter, 1996).The choice of methods for estimating reliability is especially important when correcting for measurement error in ratings of job performance. These ratings are usually obtained from a single supervisor, who uses a multi-item performance appraisal form (Murphy & Cleveland,We thank the reviewers for their helpful comments. krmurphy @psu,edu. COPYRIGHT lbZo00 PERSONNEL PSYCHOLOGY, INC. 873 874 PERSONNEL PSYCHOLOGY 1995). Two methods are widely used to estimate the reliability of performance ratings. First, measures of internal consistency (e.g., coefficient alpha) can be used to estimate intrurater reliability. As will be noted below, the use of internal consistency measures to estimate the amount of measurement error in ratings is most appropriate if the term "measurement error" is used to refer to the rater's inconsistency in evaluating different facets of a subordinate's job performance. Second, measures of agreement between raters can be used to estimate interrater reliability. The use of interrater agreement measures to estimate the amount of measurement error in ratings is most appropriate if the term "measurement error" is used to refer to disagreements between similarly situated raters about individuals' levels of job performance.The purpose of this paper is to examine the current practice of using indices of interrater correlation to estimate the proportions of true score and error varianc...