Yaoyuan Vincent Tan scite author profile

Bayesian additive regression trees (BART) is a flexible prediction model/machine learning approach that has gained widespread popularity in recent years. As BART becomes more mainstream, there is an increased need for a paper that walks readers through the details of BART, from what it is to why it works. This tutorial is aimed at providing such a resource. In addition to explaining the different components of BART using simple examples, we also discuss a framework, the General BART model that unifies some of the recent BART extensions, including semiparametric models, correlated outcomes, and statistical matching problems in surveys, and models with weaker distributional assumptions. By showing how these models fit into a single framework, we hope to demonstrate a simple way of applying BART to research problems that go beyond the original independent continuous or binary outcomes framework.

show abstract

Trends in Length of Stay, Functional Outcomes, and Discharge Destination Stratified by Disease Type for Inpatient Rehabilitation in Singapore Community Hospitals From 1996 to 2005

Chen¹,

Koh²,

Naidoo³

et al. 2013

Archives of Physical Medicine and Rehabilitation

View full text Add to dashboard Cite

Predicting human-driving behavior to help driverless vehicles drive: random intercept Bayesian additive regression trees

Tan

Flannagan

Elliott

2018

View full text Add to dashboard Cite

The development of driverless vehicles has spurred the need to predict human driving behavior to facilitate interaction between driverless and human-driven vehicles.Predicting human driving movements can be challenging, and poor prediction models can lead to accidents between the driverless and human-driven vehicles. We used the vehicle speed obtained from a naturalistic driving dataset to predict whether a human-driven vehicle would stop before executing a left turn. In a preliminary analysis, we found that BART produced less variable and higher AUC values compared to a variety of other state-of-the-art binary predictor methods. However, BART assumes independent observations, but our dataset consists of multiple observations clustered by driver. Although methods extending BART to clustered or longitudinal data are available, they lack readily available software and can only be applied to clustered continuous outcomes. We extend BART to handle correlated binary observations by adding a random intercept and used a simulation study to determine bias, root mean squared error, 95% coverage, and average length of 95% credible interval in a correlated data setting. We then successfully implemented our random intercept BART model to 1 arXiv:1609.07464v2 [stat.AP] 1 May 2017 our clustered dataset and found substantial improvements in prediction performance compared to BART and random intercept linear logistic regression.

show abstract

“Robust-Squared” Imputation Models Using Bart

Tan¹,

Flannagan²,

Elliott³

2019

View full text Add to dashboard Cite

Examples of "doubly robust" estimator for missing data include augmented inverse probability weighting (AIPWT) models (Robins et al., 1994) and penalized splines of propensity prediction (PSPP) models (Zhang and Little, 2009). Doubly-robust estimators have the property that, if either the response propensity or the mean is modeled correctly, a consistent estimator of the population mean is obtained. However, doubly-robust estimators can perform poorly when modest misspecification is present in both models (Kang and Schafer, 2007). Here we consider extensions of the AIPWT and PSPP models that use Bayesian Additive Regression Trees (BART; Chipman et al., 2010) to provide highly robust propensity and mean model estimation. We term these "robust-squared" in the sense that the propensity score, the means, or both can be estimated with minimal model misspecification, and applied to the doubly-robust estimator. We consider their behavior via simulations where propensities and/or mean models are misspecified. We apply our proposed method to impute missing instantaneous velocity (delta-v) values from the 2014 National Automotive Sampling System Crashworthiness Data System dataset and missing Blood Alcohol Concentration values from the 2015 Fatality Analysis Reporting System dataset. We found that BART applied to PSPP and AIPWT, provides a more robust and efficient estimate compared to PSPP and AIPWT, with the BART-estimated propensity score combined with PSPP providing the most efficient estimator with close to nominal coverage.

show abstract

Development of a real-time prediction model of driver behavior at intersections using kinematic time series data

Tan

Elliott

Flannagan

2017

Accident Analysis & Prevention

View full text Add to dashboard Cite

Accounting for selection bias due to death in estimating the effect of wealth shock on cognition for the Health and Retirement Study

Tan

Flannagan

Pool

et al. 2021

Statistics in Medicine

View full text Add to dashboard Cite

The Health and Retirement Study (HRS) is a longitudinal study of U.S. adults enrolled at age 50 and older. We were interested in investigating the effect of a sudden large decline in wealth on the cognitive ability of subjects measured using a dataset provided composite score. However, our analysis was complicated by the lack of randomization, time‐dependent confounding, and a substantial fraction of the sample and population will die during follow‐up leading to some of our outcomes being censored. The common method to handle this type of problem is marginal structural models (MSM). Although MSM produces valid estimates, this may not be the most appropriate method to reflect a useful real‐world situation because MSM upweights subjects who are more likely to die to obtain a hypothetical population that over time, resembles that would have been obtained in the absence of death. A more refined and practical framework, principal stratification (PS), would be to restrict analysis to the strata of the population that would survive regardless of negative wealth shock experience. In this work, we propose a new algorithm for the estimation of the treatment effect under PS by imputing the counterfactual survival status and outcomes. Simulation studies suggest that our algorithm works well in various scenarios. We found no evidence that a negative wealth shock experience would affect the cognitive score of HRS subjects.

show abstract

Predicting human-driving behavior to help driverless vehicles drive: random intercept Bayesian Additive Regression Trees

Tan¹,

Flannagan²,

Elliott³

2016

Preprint

View full text Add to dashboard Cite

Bayesian additive regression trees and the General BART model

Tan¹,

Roy²

2019

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.