External Validity in Policy Evaluations That Choose Sites Purposively

Olsen, Robert B.; Orr, Larry L.; Bell, Stephen H.; Stuart, Elizabeth A.

doi:10.1002/pam.21660

Cited by 105 publications

(124 citation statements)

References 19 publications

Supporting

Mentioning

121

Contrasting

Order By: Relevance

“…IES also sponsors large-scale evaluations through contracts to major research firms (e.g., Abt Associates, Mathematica Policy Research, MDRC). These evaluations usually select a sample designed to cover all regions of the country, but sites are selected purposively to reduce costs and sometimes with other objectives in mind (e.g., to test the intervention in sites where it will produce the greatest “contrast” between the treatment and control conditions, suggesting that it may have the greatest impact); they are rarely selected randomly to be formally representative of any broader population of potential interest to policymakers (Olsen et al, 2013). …”

Section: Background and Overviewmentioning

confidence: 99%

Characteristics of School Districts That Participate in Rigorous National Educational Evaluations

Stuart

Bell

Ebnesajjad

et al. 2016

Journal of Research on Educational Effectiveness

Self Cite

View full text Add to dashboard Cite

Given increasing interest in evidence-based policy, there is growing attention to how well the results from rigorous program evaluations may inform policy decisions. However, little attention has been paid to documenting the characteristics of schools or districts that participate in rigorous educational evaluations, and how they compare to potential target populations for the interventions that were evaluated. Utilizing a list of the actual districts that participated in 11 large-scale rigorous educational evaluations, we compare those districts to several different target populations of districts that could potentially be affected by policy decisions regarding the interventions under study. We find that school districts that participated in the 11 rigorous educational evaluations differ from the interventions’ target populations in several ways, including size, student performance on state assessments, and location (urban/rural). These findings raise questions about whether, as currently implemented, the results from rigorous impact studies in education are likely to generalize to the larger set of school districts—and thus schools and students—of potential interest to policymakers, and how we can improve our study designs to retain strong internal validity while also enhancing external validity.

show abstract

Section: Background and Overviewmentioning

confidence: 99%

Characteristics of School Districts That Participate in Rigorous National Educational Evaluations

Stuart

Bell

Ebnesajjad

et al. 2016

Journal of Research on Educational Effectiveness

Self Cite

View full text Add to dashboard Cite

show abstract

“…In fact, experiments are frequently conducted in only one or two localities that cannot claim to be representative of the nation, state, or other jurisdiction for which policy is made. (See Olsen et al (2013) for a derivation of the bias that may occur when experiments are conducted with nonrepresentative populations. )…”

Section: Internal Validity and External Validity Of Experimental Estimentioning

confidence: 99%

Social Experiments

Orr¹,

Maynard²

2015

International Encyclopedia of the Social &Amp; Behavioral Sciences

Self Cite

View full text Add to dashboard Cite

This article provides the foundation for understanding social experiments and why it is valuable to conduct them. (For a more detailed discussion of most of the topics included in this article, see Orr (1999).) It begins by defining the term and discussing the rationale for using experimental methods to evaluate social policies, practices, and programs. It then discusses the design and analysis of social experiments, and the issues of internal and external validity of social experiments. The article concludes with a brief discussion of the ethics of such studies and an historical perspective on social experimentation.

show abstract

“…The average treatment effect is typically estimated using the n units in the sample and a multilevel model that accounts for the study design (e.g., random block design or cluster randomized design; Raudenbush & Bryk, 2002), and the impact of the experiment is evaluated by comparing this estimate to its standard error. Imai, King, and Stuart (2008) and Olsen et al (2012) have shown that when the sample is not representative of the population and when site-specific treatment effects vary, the sample based estimate of is biased. Our goal here, therefore, is to develop a strategy for selecting the n units in the sample S so that the sample is compositionally similar to the N units in the inference population P, thereby leading to a less biased and more precise estimate of the population average treatment effect.…”

Section: Definitions Assumptions and Goalsmentioning

confidence: 97%