2022
DOI: 10.1080/00273171.2021.1994364
|View full text |Cite
|
Sign up to set email alerts
|

Tuning Random Forests for Causal Inference under Cluster-Level Unmeasured Confounding

Abstract: Recently, there has been growing interest in using machine learning methods for causal inference due to their automatic and flexible ability to model the propensity score and the outcome model. However, almost all the machine learning methods for causal inference have been studied under the assumption of no unmeasured confounding and there is little work on handling omitted/unmeasured variable bias. This paper focuses on a machine learning method based on random forests known as Causal Forests and presents fiv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
23
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 8 publications
(23 citation statements)
references
References 52 publications
0
23
0
Order By: Relevance
“…The second and third modifications are motivated from (i) the study by Suk and Kang (2022b) and (ii) the doubly robust property of the TMLE estimator. Among five modifications by Suk and Kang (2022b), we use modifications using random effects propensity scores and fixed effects propensity scores because they are easy to implement.…”
Section: Our Modifications For Tmlementioning
confidence: 99%
See 1 more Smart Citation
“…The second and third modifications are motivated from (i) the study by Suk and Kang (2022b) and (ii) the doubly robust property of the TMLE estimator. Among five modifications by Suk and Kang (2022b), we use modifications using random effects propensity scores and fixed effects propensity scores because they are easy to implement.…”
Section: Our Modifications For Tmlementioning
confidence: 99%
“…The second and third modifications are motivated from (i) the study by Suk and Kang (2022b) and (ii) the doubly robust property of the TMLE estimator. Among five modifications by Suk and Kang (2022b), we use modifications using random effects propensity scores and fixed effects propensity scores because they are easy to implement. If we correctly specify fixed effects or random effects propensity score models within each group and inject them inside TMLE, it becomes robust to bias from cluster-level unmeasured confounders and will yield a consistent estimator of the ATE.…”
Section: Our Modifications For Tmlementioning
confidence: 99%
“…Q-fe will remove the cluster-level impact on the outcome without requiring knowledge of cluster-level covariates and can be fitted by adding a J − 1 cluster dummy matrix, S j , that indicates individual i's cluster membership in one of the J − 1 clusters. This modification is motivated by the econometrics or causal inference literature (e.g., Suk & Kang, 2022a, 2022bWooldridge, 2010) where unobserved cluster-specific effects are often modeled as fixed effects. Also, we propose two modifications for Q-learning based on random effects outcome models, and the models are written as:…”
Section: Modifications For Q-learningmentioning
confidence: 99%
“…Model (11) uses the demeaned outcome Y * ij , demeaned covariates, and demeaned treatment to estimate regression coefficients in outcome regression. The motivation for the three modifications is rooted in the idea that by subtracting cluster means from the original variables, we can create variables that are locally orthogonal to cluster-specific components in the original variables (as demonstrated in previous research by Athey et al (2019) and Suk and Kang (2022b). This is important because when unmeasured cluster-level covariates are present, they remain in the subspace that has cluster-specific variations only.…”
Section: Modifications For Q-learningmentioning
confidence: 99%
“…Following the data analysis procedure used in Suk and Kang (2022b), we used both the fifthgrade assessment data in the spring of 2004 and the eighth-grade assessment data in the spring of 2007. The 2004 data was used to obtain pre-treatment covariates (e.g., prior achievement scores, gender) that affect the treatment mechanism and the outcome process based on prior works about algebra courses in middle school (Rickles, 2013;Rickles & Seltzer, 2014;Suk & Kang, 2022b;Walston & McCarroll, 2010). We used the 2007 data to obtain the treatment and outcome variables.…”
Section: Data and Variablesmentioning
confidence: 99%