For continuous constructs, the most frequently used index of interrater agreement (r wg(1))can be problematic. Typically, rwg(1) is estimated with the assumption that a uniform distribution represents no agreement. The authors review the limitations of this uniform nullr wg(1) index and discuss alternative methods for measuring interrater agreement. A new interrater agreement statistic,a wg(1),is proposed. The authors derive thea wg(1)statistic and demonstrate thatawg(1) is an analogue to Cohen’s kappa, an interrater agreement index for nominal data. A comparison is made between agreement estimates based on the uniformr wg(1)and a wg(1), and issues such as minimum sample size and practical significance levels are discussed. The authors close with recommendations regarding the use ofr wg(1)/rwg(J) when a uniform null is assumed,r wg(1)/rwg(J) indices that do not assume a uniform null,awg(1) / a wg(J)indices, and generalizability estimates of interrater agreement.
This study examined variable and pattern approaches to studying the influence of individual differences on both leadership emergence and leader effectiveness. Emergent leaders were identified and then followed for 9 months of effectiveness data gathering. Bivariate correlation and regression analyses were complemented by person-based analyses. Results showed that the same pattern of individual differences (high intelligence, high dominance, high general self-efficacy, and high self-monitoring) was associated with both leadership emergence and leader effectiveness. Persons scoring high on the set of individual difference variables emerged as leaders, were promoted to leadership positions, and were rated by their superiors as effective leaders.
Undergraduate subjects possessing normative or idiosyncratic rating standards were given frame-ofreference training, rater-error training, training that controlled for structural similarities between frame-of-reference training and rater-error training, or null control training. Hypothesized pretest differences that normative raters are more accurate than idiosyncratic raters were not found. However, when data were collapsed across rating aptitude, different trainings were found to improve different measures of accuracy. Frame-of-reference trainees were most accurate on stereotype accuracy and differential accuracy, rater-error trainees were most accurate on elevation, and all groups improved on differential elevation. Results are discussed in relation to the role of rater aptitude in frame-of-reference training and the future of rater-training programs.Recent performance appraisal studies on rater training have focused on frame-of-reference training (FOR) and rater-error training (RET; e.g., Hedge &Kavanaugh, 1988;Pulakos, 1984). These studies have generally concluded that FOR is superior to RET. As a result, researchers have begun to isolate aspects of the content of FOR that lead to increased accuracy in performance ratings (Athey & Mclntyre, 1987;Sulsky & Day, 1992). However, there are several limitations in the FOR research that raise questions about the general superiority of FOR. Our purpose in this study was to address these limitations and to assess whether FOR is best for all types of rating accuracy.From a historical perspective, FOR evolved from Bernardin's work (Bernardin & Buckley, 1981;Bernardin & Pence, 1980) on rater training. Bernardin and Pence noted that traditional RET stresses that certain rating distributions are more desirable than others and that RET facilitates the learning of a new response set, which may result in lower mean ratings (less leniency) and lower scale intercorrelations (less halo), but which may also lower levels of accuracy. Indeed, they found that RET led to less accurate ratings than those obtained from a group of untrained raters. Bernardin and Pence concluded that there was a need to develop new rater-training programs that increase rating accuracy, and Bernardin and Buckley (1981) proposed FOR as an alternative training strategy.As originally proposed, FOR initially involves the identification of raters who possess idiosyncratic performance standards.
Data collected at two law enforcement agencies were used to address three specific issues concerning the development and implementation of frame‐of‐reference rater training. First, the prototype‐anchored rating system was presented as a comprehensive method for generating an appropriate frame of reference in an organizational setting. Second, sensitivity and threshold analyses were used to demonstrate a method for identifying idiosyncratic raters (i.e., raters deviating from the appropriate frame of reference) in the rater population. Finally, areas of performance where supervisors and subordinates were likely to disagree on the frame of reference were identified. Concerning this latter issue, analyses indicated supervisors viewed poor‐performance incidents more severely than did patrol officers on several dimensions of performance. To a lesser degree, supervisors and patrol officers also differed on their perceptions of the importance of poor‐performance incidents. The implications of these findings are discussed in relation to the development and implementation of frame‐of‐reference rater training.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.