Improving teacher evaluation is one of the most pressing but also contested areas of educational policy. Value-added measures have received much of the attention in new evaluation systems, but they can only be used to evaluate a fraction of teachers. Classroom observations are almost universally used to assess teachers, yet their statistical properties have received far less empirical scrutiny, in particular in consequential evaluation systems. In this essay, we highlight some conceptual and empirical challenges that are similar across these different measures of teacher quality. Based on a review of empirical research, we argue that we need much more research focused on observations as performance measures. We conclude by sketching out an agenda for future research in this area.Keywords: accountability; classroom research; educational policy; policy analysis; teacher assessment
REVIEWS/ESSAySAUgUST/SEpTEMBER 2016
379Interestingly, given their prevalent use, we know surprisingly little about the statistical properties of classroom observations in consequential personnel decisions. Indeed, much of what we know is derived from extensive research of a large-scale research study-the Measures of Effective Teaching (MET) Project (cf. Kane & Staiger, 2012)-and it is unclear how these findings might translate when evaluation reform is put into practice (Goldhaber, 2015). Will real-world classroom observations differentiate among teachers? Will they be reliable? Will teachers receive actionable feedback, leading them to seek and receive high-quality professional development? The answers to questions like these are key to understanding how the use of observational measures of performance will affect the quality of the teacher workforce.In this article, we highlight some conceptual and empirical challenges that are similar across different measures of teacher performance. We focus on the existing validity evidence around classroom observations as evaluation measures, much of which is derived from research studies as opposed to real-world evaluation systems. We highlight what we know about the stability of observational measures across raters and educational contexts. We speculate on the implications of the extant literature, and we ask what additional kinds of evidence we would need about observations to feel confident observations could be featured in fair and useful evaluation systems. We attempt to answer some of these questions while describing the conceptual issues that arise when measuring teachers' classroom performance based on observations. Based on a review of empirical research, we argue that we need more research focused on observations as performance measures, particularly from authentic settings where observational performance measures are used in consequential evaluation systems, and we sketch out an agenda for future research. Given space constraints, our goal is not to provide a comprehensive synthesis of the extant literature around classroom observations. Such a review would be helpful but goes beyond th...