This article addresses likely error rates for measuring teacher and school performance in the upper elementary grades using value-added models applied to student test score gain data. Using a realistic performance measurement system scheme based on hypothesis testing, the authors develop error rate formulas based on ordinary least squares and Empirical Bayes estimators. Empirical results suggest that value-added estimates are likely to be noisy using the amount of data that are typically used in practice. Type I and II error rates for comparing a teacher's performance to the average are likely to be about 25% with 3 years of data and 35% with 1 year of data. Corresponding error rates for overall false positive and negative errors are 10% and 20%, respectively. Lower error rates can be achieved if schools are the performance unit. The results suggest that policymakers must carefully consider likely system error rates when using value-added estimates to make high-stakes decisions regarding educators.Keywords: value-added models, performance measurement systems, student learning gains, false positive and negative error rates Student learning gains, as measured by students' scores on pretests and posttests, are increasingly being used to evaluate educator performance. Known as ''valueadded'' measures of performance, the average gains of students taught by a given teacher, instructional team, or school are often the most important outcomes for performance measurement systems that aim to identify instructional staff for special treatment, such as rewards and sanctions.Spurred by the expanding role of value-added measures in educational policy decisions, an emerging body of research has consistently found-using available data-that value-added estimates based on a few years of data can be imprecise. In this article, we add to this literature by systematically examining-from a design perspective-misclassification rates for commonly used performance measurement systems that rely on hypothesis testing.
In randomized control trials (RCTs) in the education field, the complier average causal effect (CACE) parameter is often of policy interest, because it pertains to intervention effects for students who receive a meaningful dose of treatment services. This article uses a causal inference and instrumental variables framework to examine the identification and estimation of the CACE parameter for two-level clustered RCTs. The article also provides simple asymptotic variance formulas for CACE impact estimators measured in nominal and standard deviation units. In the empirical work, data from 10 large RCTs are used to compare significance findings using correct CACE variance estimators and commonly used approximations that ignore the estimation error in service receipt rates and outcome standard deviations. The key finding is that the variance corrections have very little effect on the standard errors of standardized CACE impact estimators. Across the examined outcomes, the correction terms typically raise the standard errors by less than 1%, and change p values at the fourth or higher decimal place.Manuscript received April 16, 2010 Revision received April 26, 2010 Accepted May 24, 2010
States across the country are developing systems for evaluating school principals on the basis of student achievement growth. A common approach is to hold principals accountable for the value added of their schools—that is, schools’ contributions to student achievement growth. In theory, school value added can reflect not only principals’ effectiveness but also other school-specific influences on student achievement growth that are outside of principals’ control. In this paper, we isolate principals’ effects on student achievement growth and examine the extent to which school value added captures the effects that principals persistently demonstrate. Using longitudinal data on the math and reading outcomes of fourth- through eighth-grade students in Pennsylvania, our findings indicate that school value added provides very poor information for revealing principals’ persistent levels of effectiveness.
Relative to the randomized controlled trial (RCT), the basic regression discontinuity (RD) design suffers from lower statistical power and lesser ability to generalize causal estimates away from the treatment eligibility cutoff. This paper seeks to mitigate these limitations by adding an untreated outcome comparison function that is measured along all or most of the assignment variable. When added to the usual treated and untreated outcomes observed in the basic RD, a comparative RD (CRD) design results. One version of CRD adds a pretest measure of the study outcome (CRD-Pre); another adds posttest outcomes from a nonequivalent comparison group (CRD-CG). We describe how these designs can be used to identify unbiased causal effects away from the cutoff under the assumption that a common, stable functional form describes how untreated outcomes vary with the assignment variable, both in the basic RD and in the added outcomes data (pretests or a comparison group's posttest). We then create the two CRD designs using data from the National Head Start Impact Study, a large-scale RCT. For both designs, we find that all untreated outcome functions are parallel, which lends support to CRD's identifying assumptions. Our results also indicate that CRD-Pre and CRD-CG both yield impact estimates at the cutoff that have a similarly small bias as, but are more precise than, the basic RD's impact estimates. In addition, both CRD designs produce estimates of impacts away from the cutoff that have relatively little bias compared to estimates of the same parameter from the RCT design. This common finding appears to be driven by two different mechanisms. In this instance of CRD-CG, potential untreated outcomes were likely independent of the assignment variable from the start. This was not the case with CRD-Pre. However, fitting a model using the observed pretests and untreated posttests to account for the initial dependence generated an accurate prediction of the missing counterfactual. The result was an unbiased causal estimate away from the cutoff, conditional on this successful prediction of the untreated outcomes of the treated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.