Ray Bai scite author profile

We consider sparse Bayesian estimation in the classical multivariate linear regression model with p regressors and q response variables. In univariate Bayesian linear regression with a single response y, shrinkage priors which can be expressed as scale mixtures of normal densities are popular for obtaining sparse estimates of the coefficients. In this paper, we extend the use of these priors to the multivariate case to estimate a p × q coefficients matrix B. We derive sufficient conditions for posterior consistency under the Bayesian multivariate linear regression framework and prove that our method achieves posterior consistency even when p > n and even when p grows at nearly exponential rate with the sample size. We derive an efficient Gibbs sampling algorithm and provide the implementation in a comprehensive R package called MBSP. Finally, we demonstrate through simulations and data analysis that our model has excellent finite sample performance.

show abstract

Forecasting urban household water demand with statistical and machine learning methods using large space-time data: A Comparative study

Duerr

Merrill

Wang

et al. 2018

Environmental Modelling & Software

View full text Add to dashboard Cite

Spike-and-Slab Group Lassos for Grouped Regression and Sparse Generalized Additive Models

Bai

Moran

Antonelli

et al. 2020

Journal of the American Statistical Association

View full text Add to dashboard Cite

Fast Algorithms and Theory for High-Dimensional Bayesian Varying Coefficient Models

Bai¹,

Boland²,

Chen³

2019

Preprint

View full text Add to dashboard Cite

Nonparametric varying coefficient (NVC) models are widely used for modeling time-varying effects on responses that are measured repeatedly. In this paper, we introduce the nonparametric varying coefficient spike-and-slab lasso (NVC-SSL) for Bayesian estimation and variable selection in NVC models. The NVC-SSL simultaneously estimates the functionals of the significant time-varying covariates while thresholding out insignificant ones. Our model can be implemented using a highly efficient expectation-maximization (EM) algorithm, thus avoiding the computational burden of Markov chain Monte Carlo (MCMC) in high dimensions. In contrast to frequentist NVC models, hardly anything is known about the large-sample properties for Bayesian NVC models. In this paper, we take a step towards addressing this longstanding gap between methodology and theory by deriving posterior contraction rates under the NVC-SSL model when the number of covariates grows at nearly exponential rate with sample size. Finally, we illustrate our methodology through simulation studies and data analysis.

show abstract

Spike-and-Slab Group Lasso for Consistent Estimation and Variable Selection in Non-Gaussian Generalized Additive Models

Bai¹

2020

Preprint

View full text Add to dashboard Cite

VCBART: Bayesian trees for varying coefficients

Deshpande¹,

Bai²,

Balocchi³

et al. 2020

Preprint

View full text Add to dashboard Cite

The linear varying coefficient (VC) model generalizes the conventional linear model by allowing the additive effect of each covariate on the outcome to vary as a function of additional effect modifiers. While there are many existing procedures for VC modeling with a single scalar effect modifier (often assumed to be time), there has, until recently, been comparatively less development for settings with multivariate modifiers. Unfortunately, existing state-of-the-art procedures that can accommodate multivariate modifiers typically make restrictive structural assumptions about the covariate effect functions or require intensive problem-specific hand-tuning that scales poorly to large datasets. In response, we propose VC-BART, which estimates the covariate effect functions in a VC model using Bayesian Additive Regression Trees (BART).On several synthetic and real-world data sets, we demonstrate that, with simple default hyperparameter settings, VC-BART displays covariate effect recovery performance superior to state-of-the-art VC modeling techniques and predictive performance on par with more flexible but less interpretable nonparametric regression procedures. We further demonstrate the theoretical near-optimality of VC-BART by synthesizing recent theoretical results about the VC model and BART to derive posterior concentration rates in settings with independent and correlated errors. An R package implementing VC-BART is available at https://github.com/skdeshpande91/VCBART

show abstract

Large-scale multiple hypothesis testing with the normal-beta prime prior

Bai

Ghosh

2019

Statistics

View full text Add to dashboard Cite

We revisit the problem of simultaneously testing the means of n independent normal observations under sparsity. We take a Bayesian approach to this problem by studying a scale-mixture prior known as the normal-beta prime (NBP) prior. To detect signals, we propose a hypothesis test based on thresholding the posterior shrinkage weight under the NBP prior. Taking the loss function to be the expected number of misclassified tests, we show that our test procedure asymptotically attains the optimal Bayes risk when the signal proportion p is known. When p is unknown, we introduce an empirical Bayes variant of our test which also asymptotically attains the Bayes Oracle risk in the entire range of sparsity parameters p ∝ n − , ∈ (0, 1). Finally, we also consider restricted marginal maximum likelihood (REML) and hierarchical Bayes approaches for estimating a key hyperparameter in the NBP prior and examine multiple testing under these frameworks.

show abstract

Individual-Level and Neighborhood-Level Risk Factors for Severe Maternal Morbidity

Meeker

Canelón

Bai³

et al. 2021

View full text Add to dashboard Cite

OBJECTIVE: To investigate the association between individual-level and neighborhood-level risk factors and severe maternal morbidity. METHODS: This was a retrospective cohort study of all pregnancies delivered between 2010 and 2017 in the University of Pennsylvania Health System. International Classification of Diseases codes classified severe maternal morbidity according to the Centers for Disease Control and Prevention guidelines. Logistic regression modeling evaluated individual-level risk factors for severe maternal morbidity, such as maternal age and preeclampsia diagnosis. Additionally, we used spatial autoregressive modeling to assess Census-tract, neighborhood-level risk factors for severe maternal morbidity such as violent crime and poverty. RESULTS: Overall, 63,334 pregnancies were included, with a severe maternal morbidity rate of 2.73%, or 272 deliveries with severe maternal morbidity per 10,000 delivery hospitalizations. In our multivariable model assessing individual-level risk factors for severe maternal morbidity, the magnitude of risk was highest for patients with a cesarean delivery (adjusted odds ratio [aOR] 3.50, 95% CI 3.15–3.89), stillbirth (aOR 4.60, 95% CI 3.31–6.24), and preeclampsia diagnosis (aOR 2.71, 95% CI 2.41–3.03). Identifying as White was associated with lower odds of severe maternal morbidity at delivery (aOR 0.73, 95% CI 0.61–0.87). In our final multivariable model assessing neighborhood-level risk factors for severe maternal morbidity, the rate of severe maternal morbidity increased by 2.4% (95% CI 0.37–4.4%) with every 10% increase in the percentage of individuals in a Census tract who identified as Black or African American when accounting for the number of violent crimes and percentage of people identifying as White. CONCLUSION: Both individual-level and neighborhood-level risk factors were associated with severe maternal morbidity. These factors may contribute to rising severe maternal morbidity rates in the United States. Better characterization of risk factors for severe maternal morbidity is imperative for the design of clinical and public health interventions seeking to lower rates of severe maternal morbidity and maternal mortality.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ray Bai

High-dimensional multivariate posterior consistency under global–local shrinkage priors

Forecasting urban household water demand with statistical and machine learning methods using large space-time data: A Comparative study

Spike-and-Slab Group Lassos for Grouped Regression and Sparse Generalized Additive Models

Fast Algorithms and Theory for High-Dimensional Bayesian Varying Coefficient Models

Spike-and-Slab Group Lasso for Consistent Estimation and Variable Selection in Non-Gaussian Generalized Additive Models

VCBART: Bayesian trees for varying coefficients

Large-scale multiple hypothesis testing with the normal-beta prime prior

Individual-Level and Neighborhood-Level Risk Factors for Severe Maternal Morbidity

Contact Info

Product

Resources

About