Propensity score prediction for electronic healthcare databases using super learner and high-dimensional propensity score methods

Ju, Cheng; Combs, Mary; Lendle, Samuel; Franklin, Jessica M.; Wyss, Richard; Schneeweiß, Sebastian; Laan, Mark J. van der

doi:10.1080/02664763.2019.1582614

Cited by 43 publications

(49 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Super Learner has also been considered in the context of longitudinal data, where it has been found useful in the presence of model misspecification . Overall, Super Learner is now being widely adopted in the causal inference literature and in applications …”

Section: Introductionmentioning

confidence: 99%

“…14 Overall, Super Learner is now being widely adopted in the causal inference literature and in applications. [15][16][17][18] In much of this literature, and certainly in most applications, adjustment via the propensity score is achieved in a singly robust fashion, that is, designed to give consistent inference on the causal estimand under an assumption of correct specification (at least within a class of parametric, flexible, or ensemble procedures). Doubly robust procedures that provide consistent estimation if either the propensity score or a proposed conditional outcome mean model-as would be utilized in standard regression-are correctly specified are also well established in the statistical literature, but such procedures are not as widely adopted in practice.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Should a propensity score model be super? The utility of ensemble procedures for causal adjustment

Alam

Moodie

Stephens

2018

Statistics in Medicine

View full text Add to dashboard Cite

In investigations of the effect of treatment on outcome, the propensity score is a tool to eliminate imbalance in the distribution of confounding variables between treatment groups. Recent work has suggested that Super Learner, an ensemble method, outperforms logistic regression in nonlinear settings; however, experience with real‐data analyses tends to show overfitting of the propensity score model using this approach. We investigated a wide range of simulated settings of varying complexities including simulations based on real data to compare the performances of logistic regression, generalized boosted models, and Super Learner in providing balance and for estimating the average treatment effect via propensity score regression, propensity score matching, and inverse probability of treatment weighting. We found that Super Learner and logistic regression are comparable in terms of covariate balance, bias, and mean squared error (MSE); however, Super Learner is computationally very expensive thus leaving no clear advantage to the more complex approach. Propensity scores estimated by generalized boosted models were inferior to the other two estimation approaches. We also found that propensity score regression adjustment was superior to either matching or inverse weighting when the form of the dependence on the treatment on the outcome is correctly specified.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Should a propensity score model be super? The utility of ensemble procedures for causal adjustment

Alam

Moodie

Stephens

2018

Statistics in Medicine

View full text Add to dashboard Cite

show abstract

“…This is especially true when computationally intensive learners, such as bagged CART or boosted CART , are included in the candidate library . In general, the computation time for SL is at least twice the sum of all the candidate learners' computation time, considering fitting on the training sets, computing the corresponding weights from the validation sets, and fitting the entire data eventually . Similar to other MSCM simulation studies , we computed robust sandwich standard error in this paper.…”

Section: Discussionmentioning

confidence: 99%

Estimating inverse probability weights using super learner when weight-model specification is unknown in a marginal structural Cox model context

Karim

Platt

2017

Statist. Med.

View full text Add to dashboard Cite

Correct specification of the inverse probability weighting (IPW) model is necessary for consistent inference from a marginal structural Cox model (MSCM). In practical applications, researchers are typically unaware of the true specification of the weight model. Nonetheless, IPWs are commonly estimated using parametric models, such as the main-effects logistic regression model. In practice, assumptions underlying such models may not hold and data-adaptive statistical learning methods may provide an alternative. Many candidate statistical learning approaches are available in the literature. However, the optimal approach for a given dataset is impossible to predict. Super learner (SL) has been proposed as a tool for selecting an optimal learner from a set of candidates using cross-validation. In this study, we evaluate the usefulness of a SL in estimating IPW in four different MSCM simulation scenarios, in which we varied the specification of the true weight model specification (linear and/or additive). Our simulations show that, in the presence of weight model misspecification, with a rich and diverse set of candidate algorithms, SL can generally offer a better alternative to the commonly used statistical learning approaches in terms of MSE as well as the coverage probabilities of the estimated effect in an MSCM. The findings from the simulation studies guided the application of the MSCM in a multiple sclerosis cohort from British Columbia, Canada (1995-2008), to estimate the impact of beta-interferon treatment in delaying disability progression. Copyright © 2017 John Wiley & Sons, Ltd.

show abstract

“…Therefore, the weights of the super learner are calculated by minimizing the single-split cross-validated loss as suggested in [9]. Ju et al [34] show the success of the single-split super learner on three large healthcare databases.…”

Section: Cross-validationmentioning

confidence: 99%

A Super-Learner Ensemble of Deep Networks for Vehicle-Type Classification

2020

View full text Add to dashboard Cite

Automatic vehicle-type classification plays an imperative role in the development of efficient Intelligent Transportation Systems (ITS). In this paper, a super-learner ensemble is proposed for the vehicle-type classification problem. A densely connected single-split super learner is utilized to exploit the strengths and diminish the weaknesses of the individual base learners ResNet50, Xception, and DenseNet. The super learner aims to learn fusion weights in a data-adaptive manner to obtain the optimal combination of the base learners. The proposed method is simple, robust, and enhances the discrimination capabilities among the similarly-looking classes without requiring any hand-crafted features or logical reasoning. The proposed method is evaluated using two of the most challenging publicly available traffic surveillance datasets: the MIOvision Traffic Camera Dataset (MIO-TCD) and the Beijing Institute of Technology's (BIT) vehicle classification dataset. Three variants of the super learner ensemble: RXD-CV-CW, RXD-CV-CW-NCW and Augmented-RXD, were examined on the MIO-TCD dataset with variations in applying class weights and data augmentation during training. RXD-CV-CW-NCW and Augmented-RXD share the third place among the published state-of-the-art methods reported in the MIO-TCD classification challenge. Augmented-RXD generalizes to the classes in common between the two datasets without degrading its performance on the MIO-TCD dataset. Both variants achieved an overall accuracy of 97.94%, and a Cohen Kappa score of 96.78%. In addition, the super-learner variants that we trained on the BIT-Vehicle dataset images achieved overall accuracies of up to 97.62%. INDEX TERMS Deep learning, ensemble learning, intelligent transport systems, vehicle classification. • We present a super-learner ensemble model for vehicle-type classification in surveillance frames. The super learner consists of a fully-connected layer added to the fused outputs of three base learners: ResNet50 [10], Xception [11], and DenseNet [12].

show abstract

Propensity score prediction for electronic healthcare databases using super learner and high-dimensional propensity score methods

Cited by 43 publications

References 33 publications

Should a propensity score model be super? The utility of ensemble procedures for causal adjustment

Should a propensity score model be super? The utility of ensemble procedures for causal adjustment

Estimating inverse probability weights using super learner when weight-model specification is unknown in a marginal structural Cox model context

A Super-Learner Ensemble of Deep Networks for Vehicle-Type Classification

Contact Info

Product

Resources

About