Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning

Díaz, Iván

doi:10.1093/biostatistics/kxz042

Cited by 38 publications

(32 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this paper, we focused on the estimation of a causal effect given that the identifiability conditions were satisfied. In practice, the predictive performance of the Q-model is not sufficient to ensure the absence of bias in the estimation of the causal effect, which requires a precise conceptual knowledge of the causal model 35 .…”

Section: Discussionmentioning

confidence: 99%

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Borgne

Chatton

Léger

et al. 2021

Sci Rep

View full text Add to dashboard Cite

In clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.

show abstract

Section: Discussionmentioning

confidence: 99%

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Borgne

Chatton

Léger

et al. 2021

Sci Rep

View full text Add to dashboard Cite

show abstract

“…It is an effective approach by using machine learning methods to predict active antibacterial compounds [ 26 , 28 , 36 ]. The accuracy of the prediction model is affected by many factors, such as the quality of the benchmark datasets [ 37 ], the representative molecular characteristics of the compounds [ 16 ], the applicable machine learning models [ 9 ], and the optimized model parameters [ 38 ]. This study collected a large amount of experimental data on the antibacterial activity of compounds from the ChEMBL and PubChem databases.…”

Section: Discussionmentioning

confidence: 99%

Screening of antibacterial compounds with novel structure from the FDA approved drugs using machine learning methods

Tong

Yang

et al. 2022

Aging

View full text Add to dashboard Cite

Bacterial infection is one of the most important factors affecting the human life span. Elderly people are more harmed by bacterial infections due to their deficits in immunity. Because of the lack of new antibiotics in recent years, bacterial resistance has increasingly become a serious problem globally. In this study, an antibacterial compound predictor was constructed using the support vector machines and random forest methods and the data of the active and inactive antibacterial compounds from the ChEMBL database. The results showed that both models have excellent prediction performance (mean accuracy >0.9 and mean AUC >0.9 for the two models). We used the predictor to screen potential antibacterial compounds from FDA-approved drugs in the DrugBank database. The screening results showed that 1087 small-molecule drugs have potential antibacterial activity and 154 of them are FDA-approved antibacterial drugs, which accounts for 76.2% of the approved antibacterial drugs collected in this study. Through molecular fingerprint similarity analysis and common substructure analysis, we screened 8 predicted antibacterial small-molecule compounds with novel structures compared with known antibacterial drugs, and 5 of them are widely used in the treatment of various tumors. This study provides a new insight for predicting antibacterial compounds by using approved drugs, the predicted compounds might be used to treat bacterial infections and extend lifespan.

show abstract

“…First, while our simulations aimed at providing new empirical evidence about the operating characteristics of state‐of‐the‐art machine learning techniques for estimating TEH with survival data, we only provided frequentist coverage probability for AFT‐BART‐NP because the credible intervals are readily available from the MCMC output. On the other hand, it may be challenging to precisely estimate the variance of the ISTE using DL, RSF, and TSHEE, without resorting to more computationally intensive sample‐splitting methods 2,15,58,59 . Developing and investigating new methods for the variance and interval estimation for frequentist machine learning represent an important avenue for future research.…”

Section: Discussionmentioning

confidence: 99%

Estimating heterogeneous survival treatment effect in observational data using machine learning

2021

Statistics in Medicine

View full text Add to dashboard Cite

Methods for estimating heterogeneous treatment effect in observational data have largely focused on continuous or binary outcomes, and have been relatively less vetted with survival outcomes. Using flexible machine learning methods in the counterfactual framework is a promising approach to address challenges due to complex individual characteristics, to which treatments need to be tailored. To evaluate the operating characteristics of recent survival machine learning methods for the estimation of treatment effect heterogeneity and inform better practice, we carry out a comprehensive simulation study presenting a wide range of settings describing confounded heterogeneous survival treatment effects and varying degrees of covariate overlap. Our results suggest that the nonparametric Bayesian Additive Regression Trees within the framework of accelerated failure time model (AFT‐BART‐NP) consistently yields the best performance, in terms of bias, precision, and expected regret. Moreover, the credible interval estimators from AFT‐BART‐NP provide close to nominal frequentist coverage for the individual survival treatment effect when the covariate overlap is at least moderate. Including a nonparametrically estimated propensity score as an additional fixed covariate in the AFT‐BART‐NP model formulation can further improve its efficiency and frequentist coverage. Finally, we demonstrate the application of flexible causal machine learning estimators through a comprehensive case study examining the heterogeneous survival effects of two radiotherapy approaches for localized high‐risk prostate cancer.

show abstract

Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning

Cited by 38 publications

References 21 publications

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Screening of antibacterial compounds with novel structure from the FDA approved drugs using machine learning methods

Estimating heterogeneous survival treatment effect in observational data using machine learning

Contact Info

Product

Resources

About