The use of propensity scores to control for pretreatment imbalances on observed variables in non-randomized or observational studies examining the causal effects of treatments or interventions has become widespread over the past decade. For settings with two conditions of interest such as a treatment and a control, inverse probability of treatment weighted (IPTW) estimation with propensity scores estimated via boosted models has been shown in simulation studies to yield causal effect estimates with desirable properties. There are tools (e.g., the twang package in R) and guidance for implementing this method with two treatments. However, there is not such guidance for analyses of three or more treatments. The goals of this paper are two-fold: (1) to provide step-by-step guidance for researchers who want to implement propensity score weighting for multiple treatments and (2) to propose the use of generalized boosted models (GBM) for estimation of the necessary propensity score weights. We define the causal quantities that may be of interest to studies of multiple treatments and derive weighted estimators of those quantities. We present a detailed plan for using GBM to estimate propensity scores and using those scores to estimate weights and causal effects. Tools for assessing balance and overlap of pretreatment variables among treatment groups in the context of multiple treatments are also provided. A case study examining the effects of three treatment programs for adolescent substance abuse demonstrates the methods.
Causal effect modeling with naturalistic rather than experimental data is challenging. In observational studies participants in different treatment conditions may also differ on pretreatment characteristics that influence outcomes. Propensity score methods can theoretically eliminate these confounds for all observed covariates, but accurate estimation of propensity scores is impeded by large numbers of covariates, uncertain functional forms for their associations with treatment selection, and other problems. This article demonstrates that boosting, a modern statistical technique, can overcome many of these obstacles. The authors illustrate this approach with a study of adolescent probationers in substance abuse treatment programs. Propensity score weights estimated using boosting eliminate most pretreatment group differences and substantially alter the apparent relative effects of adolescent substance abuse treatment.
The RAND Corporation is a nonprofit research organization providing objective analysis and effective solutions that address the challenges facing the public and private sectors around the world. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors. R ® is a registered trademark.
The use of complex value-added models that attempt to isolate the contributions of teachers or schools to student development is increasing. Several variations on these models are being applied in the research literature, and policy makers have expressed interest in using these models for evaluating teachers and schools. In this article, we present a general multivariate, longitudinal mixed-model that incorporates the complex grouping structures inherent to longitudinal student data linked to teachers. We summarize the principal existing modeling approaches, show how these approaches are special cases of the proposed model, and discuss possible extensions to model more complex data structures. We present simulation and analytical results that clarify the interplay between estimated teacher effects and repeated outcomes on students over time. We also explore the potential impact of model misspecifications, including missing student covariates and assumptions about the accumulation of teacher effects over time, on key inferences made from the models. We conclude that mixed models that account for student correlation over time are reasonably robust to such misspecifications when all the schools in the sample serve similar student populations. However, student characteristics are likely to confound estimated teacher effects when schools serve distinctly different populations.
School-based drug prevention programs can prevent occasional and more serious drug use, help low- to high-risk adolescents, and be effective in diverse school environments.
The utility of value-added estimates of teachers' effects on student test scores depends on whether they can distinguish between high- and low-productivity teachers and predict future teacher performance. This article studies the year-to-year variability in value-added measures for elementary and middle school mathematics teachers from five large Florida school districts. We find year-to-year correlations in value-added measures in the range of 0.2–0.5 for elementary school and 0.3–0.7 for middle school teachers. Much of the variation in measured teacher performance (roughly 30–60 percent) is due to sampling error from “noise” in student test scores. Persistent teacher effects account for about 50 percent of the variation not due to noise for elementary teachers and about 70 percent for middle school teachers. The remaining variance is due to teacher-level time-varying factors, but little of it is explained by observed teacher characteristics. Averaging estimates from two years greatly improves their ability to predict future performance.
This article develops a validity argument approach for use on observation protocols currently used to assess teacher quality for high-stakes personnel and professional development decisions. After defining the teaching quality domain, we articulate an interpretive argument for observation protocols. To illustrate the types of evidence that might compose a validity argument, we draw on data from a validity study of the Classroom Assessment Scoring System for secondary classrooms. Based on data from 82 Algebra classrooms, we illustrate how data from observation scores, valueadded models, generalizability studies, and measures of teacher knowledge, student achievement, and teacher and student beliefs could be used to build a validity argument for observation protocols. Strengths and limitations of the validity argument approach as well as the issues the approach raises for observation protocol validity research are considered.Recent federal legislation has put states and districts under unprecedented pressure to improve teaching quality through evaluation (United States Department of Education, 2009; United States Department of Education Office of Planning Evaluation and Policy Development, 2010).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.