Consider the problem of high dimensional variable selection for the Gaussian linear model when the unknown error variance is also of interest. In this paper, we show that the use of conjugate shrinkage priors for Bayesian variable selection can have detrimental consequences for such variance estimation. Such priors are often motivated by the invariance argument of Jeffreys (1961). Revisiting this work, however, we highlight a caveat that Jeffreys himself noticed; namely that biased estimators can result from inducing dependence between parameters a priori. In a similar way, we show that conjugate priors for linear regression, which induce prior dependence, can lead to such underestimation in the Bayesian high-dimensional regression setting. Following Jeffreys, we recommend as a remedy to treat regression coefficients and the error variance as independent a priori. Using such an independence prior framework, we extend the Spike-and-Slab Lasso of Ročková and George (2018) to the unknown variance case. This extended procedure outperforms both the fixed variance approach and alternative penalized likelihood methods on simulated data. On the protein activity dataset of Clyde and Parmigiani (1998), the Spike-and-Slab Lasso with unknown variance achieves lower cross-validation error than alternative penalized likelihood methods, demonstrating the gains in predictive accuracy afforded by simultaneous error variance estimation.
Bayesian modeling has become a staple for researchers analyzing data. Thanks to recent developments in approximate posterior inference, modern researchers can easily build, use, and revise complicated Bayesian models for large and rich data. These new abilities, however, bring into focus the problem of model assessment. Researchers need tools to diagnose the fitness of their models, to understand where a model falls short, and to guide its revision. In this paper we develop a new method for Bayesian model checking, the population predictive check ( -).-s are built on posterior predictive checks ( s), a seminal method that checks a model by assessing the posterior predictive distribution on the observed data. Though powerful, s use the data twice-both to calculate the posterior predictive and to evaluate it-which can lead to overconfident assessments.-s, in contrast, compare the posterior predictive distribution to the population distribution of the data. This strategy blends Bayesian modeling with frequentist assessment, leading to a robust check that validates the model on its generalization. Of course the population distribution is not usually available; thus we use tools like the bootstrap and cross validation to estimate the -. Further, we extend -s to hierarchical models. We study -s on classical regression and a hierarchical model of text. We show that -s are robust to overfitting and can be easily deployed on a broad family of models.
Machine learning is a useful tool for accelerating materials discovery, however it is a challenge to develop accurate methods that successfully transfer between domains while also broadening the scope of reaction conditions considered. In this paper, we consider how active- and transfer-learning methods can be used as building blocks for predicting reaction outcomes of metal halide perovskite synthesis. We then introduce a serendipity-based recommendation system that guides these methods to balance novelty and accuracy. The model-agnostic recommendation system is tested across active- and transfer-learning algorithms, using laboratory experiments for training and testing and a time-separated hold out that includes four different chemical systems. The serendipity recommendation system achieves high accuracy while increasing the scope of the synthesis conditions explored.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.