Bayesian Neural Network Priors Revisited

Fortuin, Vincent; Garriga-Alonso, Adrià; Ober, Sebastian W.; Wenzel, Florian; Rätsch, Gunnar; Turner, Richard E.; Wilk, Mark van der; Aitchison, Laurence

doi:10.48550/arxiv.2102.06571

Cited by 15 publications

(37 citation statements)

References 38 publications

(56 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tran et al (2020) propose a new prior for Bayesian neural networks inspired by Gaussian processes (Rasmussen & Nickisch, 2010) based on this hypothesis. In concurrent work, Fortuin et al (2021) also explore several alternatives to standard Gaussian priors inspired by the cold posteriors effect. Wilson & Izmailov (2020) on the other hand, argue that vague Gaussian priors in the parameter space induce useful function-space priors.…”

Section: What Is the Effect Of Priors In Bayesian Neural Network?mentioning

confidence: 99%

What Are Bayesian Neural Network Posteriors Really Like?

Izmailov,

Vikram,

Hoffman

et al. 2021

Preprint

View full text Add to dashboard Cite

The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as meanfield variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. We show that (1) BNNs can achieve significant performance gains over standard training and deep ensembles; (2) a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains; (3) in contrast to recent studies, we find posterior tempering is not needed for near-optimal performance, with little evidence for a "cold posterior" effect, which we show is largely an artifact of data augmentation; (4) BMA performance is robust to the choice of prior scale, and relatively similar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5) Bayesian neural networks show surprisingly poor generalization under domain shift; (6) while cheaper alternatives such as deep ensembles and SGMCMC can provide good generalization, they provide distinct predictive distributions from HMC. Notably, deep ensemble predictive distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.

show abstract

Section: What Is the Effect Of Priors In Bayesian Neural Network?mentioning

confidence: 99%

What Are Bayesian Neural Network Posteriors Really Like?

Izmailov,

Vikram,

Hoffman

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…( 4) is used in calculations, but the data follows, for example, a Student-t distribution (see Section D.2.2 for an example). The prior p(θ), which controls together with the NN architecture the function space of our approximation, may also lead to sub-optimal results using the Bayesian paradigm [143].…”

Section: A4 Posterior Tempering For Model Misspecificationmentioning

confidence: 99%

“…In this regard, a technique that is often used in practice is posterior tempering, i.e., sampling θ values from p(θ|D) 1/τ instead of the true posterior (τ = 1), where τ is called temperature. Specifically, it has been reported in the literature that "cold" posteriors, τ < 1, perform better [146,148,149], although using a more informed prior can potentially remove this effect [143]. Cold posteriors can be interpreted as over-counting the available data using 1/τ replications of it, thus, making the posterior more concentrated.…”

Section: A4 Posterior Tempering For Model Misspecificationmentioning

confidence: 99%

“…For convenience, assume that the prior is parametrized by σ 2 θ ; e.g., σ 2 θ can be the variance of each normally distributed component of the parameter vector θ. Other prior choices are the Laplace, Cauchy, and horseshoe distributions, where σ 2 θ can be construed as an unknown scale parameter; see relevant discussions in [124,142,143]. Next, by integrating over σ 2 θ and σ 2 u , Eq.…”

Section: A1 Statistics Of the Predictive Distributionmentioning

confidence: 99%

See 1 more Smart Citation

Uncertainty Quantification in Scientific Machine Learning: Methods, Metrics, and Comparisons

Psaros¹,

Meng²,

Zou³

et al. 2022

Preprint

View full text Add to dashboard Cite

Neural networks (NNs) are currently changing the computational paradigm on how to combine data with mathematical laws in physics and engineering in a profound way, tackling challenging inverse and ill-posed problems not solvable with traditional methods. However, quantifying errors and uncertainties in NN-based inference is more complicated than in traditional methods. This is because in addition to aleatoric uncertainty associated with noisy data, there is also uncertainty due to limited data, but also due to NN hyperparameters, overparametrization, optimization and sampling errors as well as model misspecification. Although there are some recent works on uncertainty quantification (UQ) in NNs, there is no systematic investigation of suitable methods towards quantifying the total uncertainty effectively and efficiently even for function approximation, and there is even less work on solving partial differential equations and learning operator mappings between infinite-dimensional function spaces using NNs. In this work, we present a comprehensive framework that includes uncertainty modeling, new and existing solution methods, as well as evaluation metrics and post-hoc improvement approaches. To demonstrate the applicability and reliability of our framework, we present an extensive comparative study in which various methods are tested on prototype problems, including problems with mixed input-output data, and stochastic problems in high dimensions. In the Appendix, we include a comprehensive description of all the UQ methods employed, which we will make available as open-source library of all codes included in this framework.

show abstract

“…Usually the weights in Bayesian neural networks are assumed to be independent (Neal, 1996;Matthews et al, 2018;Lee et al, 2018;Garriga-Alonso et al, 2019). However, some works (Garriga-Alonso and van der Wilk, 2021; Fortuin et al, 2021) proposed correlated priors for convolutional neural networks since trained weights are empirically strongly correlated. They showed that these correlated priors can improve overall performance.…”

Section: Dependence Propertiesmentioning

confidence: 99%

Dependence between Bayesian neural network units

Vladimirova¹,

Arbel²,

Girard³

2021

Preprint

View full text Add to dashboard Cite

The connection between Bayesian neural networks and Gaussian processes gained a lot of attention in the last few years, with the flagship result that hidden units converge to a Gaussian process limit when the layers width tends to infinity. Underpinning this result is the fact that hidden units become independent in the infinite-width limit. Our aim is to shed some light on hidden units dependence properties in practical finite-width Bayesian neural networks. In addition to theoretical results, we assess empirically the depth and width impacts on hidden units dependence properties.

show abstract

Bayesian Neural Network Priors Revisited

Cited by 15 publications

References 38 publications

What Are Bayesian Neural Network Posteriors Really Like?

What Are Bayesian Neural Network Posteriors Really Like?

Uncertainty Quantification in Scientific Machine Learning: Methods, Metrics, and Comparisons

Dependence between Bayesian neural network units

Contact Info

Product

Resources

About