In environmental epidemiology, it is critically important to identify subpopulations that are most vulnerable to the adverse effects of air pollution so we can develop targeted interventions. In recent years, there have been many methodological developments for addressing heterogeneity of treatment effects in causal inference. A common approach is to estimate the conditional average treatment effect (CATE) for a pre-specified covariate set. However, this approach does not provide an easy-to-interpret tool for identifying susceptible subpopulations or discover new subpopulations that are not defined a priori by the researchers. In this paper, we propose a new causal rule ensemble (CRE) method with two features simultaneously: 1) ensuring interpretability by revealing heterogeneous treatment effect structures in terms of decision rules and 2) providing CATE estimates with high statistical precision similar to causal machine learning algorithms. We provide theoretical results that guarantee consistency of the estimated causal effects for the newly discovered causal rules. Furthermore, via simulations, we show that the CRE method has competitive performance on its ability to discover subpopulations and then accurately estimate the causal effects. We also develop a new sensitivity analysis method that examine robustness to unmeasured confounding bias. Lastly, we apply the CRE method to the study of the effects of long-term exposure to air pollution on the 5-year mortality rate of the New England Medicare-enrolled population in United States. Code is available at https://github.com/kwonsang/causal rule ensemble.
This paper introduces an innovative Bayesian machine learning algorithm to draw interpretable inference on heterogeneous causal effects in the presence of imperfect compliance (e.g., under an irregular assignment mechanism). We show, through Monte Carlo simulations, that the proposed Bayesian Causal Forest with Instrumental Variable (BCF-IV) methodology outperforms other machine learning techniques tailored for causal inference in discovering and estimating the heterogeneous causal effects while controlling for the familywise error rate (or -less stringently -for the false discovery rate) at leaves' level. BCF-IV sheds a light on the heterogeneity of causal effects in instrumental variable scenarios and, in turn, provides the policy-makers with a relevant tool for targeted policies. Its empirical application evaluates the effects of additional funding on students' performances. The results indicate that BCF-IV could be used to enhance the effectiveness of school funding on students' performance.
This paper discusses the fundamental principles of causal inference-the area of statistics that estimates the effect of specific occurrences, treatments, interventions, and exposures on a given outcome from experimental and observational data. We explain the key assumptions required to identify causal effects, and highlight the challenges associated with the use of observational data. We emphasize that experimental thinking is crucial in causal inference. The quality of the data (not necessarily the quantity), the study design, the degree to which the assumptions are met, and the rigor of the statistical analysis allow us to credibly infer causal effects.Although we advocate leveraging the use of big data and the application of machine learning (ML) algorithms for estimating causal effects, they are not a substitute of thoughtful study design. Concepts are illustrated via examples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.