A Simple Method for Estimating Interactions Between a Treatment and a Large Number of Covariates

Tian, Lu; Alizadeh, Ash A.; Gentles, Andrew J.; Tibshirani, Robert

doi:10.1080/01621459.2014.951443

Cited by 305 publications

(431 citation statements)

References 38 publications

Supporting

Mentioning

429

Contrasting

Order By: Relevance

“…This is pursued in the high-dimensional setting in ref. 19; this work advocates solving the Lasso on a reduced set of modified covariates, rather than the full set of covariate by treatment interactions, and includes extensions to binary outcomes and survival data. The recent work in ref.…”

Section: Theoretical Resultsmentioning

confidence: 99%

Lasso adjustments of treatment effect estimates in randomized experiments

Bloniarz

Liu

Zhang

et al. 2016

Proc. Natl. Acad. Sci. U.S.A.

142

145

View full text Add to dashboard Cite

We provide a principled way for investigators to analyze randomized experiments when the number of covariates is large. Investigators often use linear multivariate regression to analyze randomized experiments instead of simply reporting the difference of means between treatment and control groups. Their aim is to reduce the variance of the estimated treatment effect by adjusting for covariates. If there are a large number of covariates relative to the number of observations, regression may perform poorly because of overfitting. In such cases, the least absolute shrinkage and selection operator (Lasso) may be helpful. We study the resulting Lasso-based treatment effect estimator under the Neyman-Rubin model of randomized experiments. We present theoretical conditions that guarantee that the estimator is more efficient than the simple difference-of-means estimator, and we provide a conservative estimator of the asymptotic variance, which can yield tighter confidence intervals than the difference-ofmeans estimator. Simulation and data examples show that Lassobased adjustment can be advantageous even when the number of covariates is less than the number of observations. Specifically, a variant using Lasso for selection and ordinary least squares (OLS) for estimation performs particularly well, and it chooses a smoothing parameter based on combined performance of Lasso and OLS.randomized experiment | Neyman-Rubin model | average treatment effect | high-dimensional statistics | Lasso R andomized experiments are widely used to measure the efficacy of treatments. Randomization ensures that treatment assignment is not influenced by any potential confounding factors, both observed and unobserved. Experiments are particularly useful when there is no rigorous theory of a system's dynamics, and full identification of confounders would be impossible. This advantage was cast elegantly in mathematical terms in the early 20th century by Jerzy Neyman, who introduced a simple model for randomized experiments, which showed that the difference of average outcomes in the treatment and control groups is statistically unbiased for the average treatment effect (ATE) over the experimental sample (1).However, no experiment occurs in a vacuum of scientific knowledge. Often, baseline covariate information is collected about individuals in an experiment. Even when treatment assignment is not related to these covariates, analyses of experimental outcomes often take them into account with the goal of improving the accuracy of treatment effect estimates. In modern randomized experiments, the number of covariates can be very large-sometimes even larger than the number of individuals in the study. In clinical trials overseen by regulatory bodies like the Food and Drug Administration and the Medicines and Healthcare products Regulatory Agency, demographic and genetic information may be recorded about each patient. In applications in the tech industry, where randomization is often called A/B testing, there is often a huge amount of behavioral data ...

show abstract

Section: Theoretical Resultsmentioning

confidence: 99%

Lasso adjustments of treatment effect estimates in randomized experiments

Bloniarz

Liu

Zhang

et al. 2016

Proc. Natl. Acad. Sci. U.S.A.

142

145

View full text Add to dashboard Cite

show abstract

“…Imai and Ratkovic [2013], Signorovitch [2007], Tian et al [2014] and Weisberg and Pontes [2015] develop lasso-like methods for causal inference in a sparse high-dimensional linear setting. Beygelzimer and Langford [2009], Dudík et al [2011], and others discuss procedures for transforming outcomes that enable off-the-shelf loss minimization methods to be used for optimal treatment policy estimation.…”

Section: Related Workmentioning

confidence: 99%

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Wager

Athey

2018

Journal of the American Statistical Association

1,727

1,634

View full text Add to dashboard Cite

Many scientific and engineering challenges-ranging from personalized medicine to customized marketing recommendations-require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

show abstract

“…Imai and Ratkovic (2013), Signorovitch (2007), Tian et al (2014), and Weisberg and Pontes (2015) develop lasso-like methods for causal inference and treatment effect heterogeneity in a setting where there are potentially a large number of covariates, so that regularization methods to discover which covariates are important. When the treatment effect interactions of interest have low dimension (that is, a small number of covariates have important interactions with the treatment), valid confidence intervals can be derived (without using sample splitting as described above); see, e.g., Chernozhukov, Hansen, and Spindler (2015) and references therein.…”

Section: Treatment Effect Heterogeneity Using Regularized Regressionmentioning

confidence: 99%

“…Some of the methods (e.g. Tian et al (2014)) propose modeling heterogeneity in the treatment and control groups separately, and then taking the difference; this can be inefficient if the covariates that affect the level of outcomes are distinct from those that affect treatment effect heterogeneity. An alternative approach is to incorporate interactions of the treatment with covariates as covariates, and then allow LASSO to select which covariates are important.…”

Section: Treatment Effect Heterogeneity Using Regularized Regressionmentioning

confidence: 99%

The Econometrics of Randomized Experiments

Athey

Imbens

2017

Handbook of Field Experiments

445

281

View full text Add to dashboard Cite

In this chapter, we present econometric and statistical methods for analyzing randomized experiments. For basic experiments we stress randomization-based inference as opposed to sampling-based inference. In randomization-based inference, uncertainty in estimates arises naturally from the random assignment of the treatments, rather than from hypothesized sampling from a large population. We show how this perspective relates to regression analyses for randomized experiments. We discuss the analyses of stratified, paired, and clustered randomized experiments, and we stress the general efficiency gains from stratification. We also discuss complications in randomized experiments such as noncompliance. In the presence of non-compliance we contrast intention-to-treat analyses with instrumental variables analyses allowing for general treatment effect heterogeneity. We consider in detail estimation and inference for heterogenous treatment effects in settings with (possibly many) covariates. These methods allow researchers to explore heterogeneity by identifying subpopulations with different treatment effects while maintaining the ability to construct valid confidence intervals. We also discuss optimal assignment to treatment based on covariates in such settings. Finally, we discuss estimation and inference in experiments in settings with interactions between units, both in general network settings and in settings where the population is partitioned into groups with all interactions contained within these groups.JEL Classification: C01, C13, C18, C21, C52, C54

show abstract

A Simple Method for Estimating Interactions Between a Treatment and a Large Number of Covariates

Cited by 305 publications

References 38 publications

Lasso adjustments of treatment effect estimates in randomized experiments

Lasso adjustments of treatment effect estimates in randomized experiments

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

The Econometrics of Randomized Experiments

Contact Info

Product

Resources

About