2020
DOI: 10.48550/arxiv.2010.08488
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks

Abstract: Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is proposed that enables the user to posit an appropriate covariance function for the task at hand. Our approach constructs a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…BNN priors. Finally, previous work has investigated the performance implications of neural network priors chosen without reference to the empirical distributions of SGD-trained networks (Ghosh & Doshi-Velez, 2017;Wu et al, 2018;Atanov et al, 2018;Nalisnick, 2018;Overweg et al, 2019;Farquhar et al, 2019;Cui et al, 2020;Rothfuss et al, 2020;Hafner et al, 2020;Matsubara et al, 2020;Tran et al, 2020;Garriga-Alonso & van der Wilk, 2021). While these priors might in certain circumstances offer performance improvements, they did not offer a recipe for finding potentially valuable features to incorporate into the weight priors.…”
Section: Related Workmentioning
confidence: 99%
“…BNN priors. Finally, previous work has investigated the performance implications of neural network priors chosen without reference to the empirical distributions of SGD-trained networks (Ghosh & Doshi-Velez, 2017;Wu et al, 2018;Atanov et al, 2018;Nalisnick, 2018;Overweg et al, 2019;Farquhar et al, 2019;Cui et al, 2020;Rothfuss et al, 2020;Hafner et al, 2020;Matsubara et al, 2020;Tran et al, 2020;Garriga-Alonso & van der Wilk, 2021). While these priors might in certain circumstances offer performance improvements, they did not offer a recipe for finding potentially valuable features to incorporate into the weight priors.…”
Section: Related Workmentioning
confidence: 99%
“…If one wants to forego the need for a well-defined divergence, one can also use a hypernetwork (Ha et al, 2016;Krueger et al, 2017) as an implicit distribution of BNN weights and then train the network to match the GP samples on a certain set of function outputs (Flam-Shepherd et al, 2018). Finally, it has recently been discovered that the ridgelet transform (Candes, 1998) can be used to approximate GP function-space distributions with BNN weight-space distributions (Matsubara et al, 2020). As a sidenote, it should be noted that the reverse can actually be achieved more easily, namely, fitting a GP to the outputs of a BNN , which can also be of interest in certain applications.…”
Section: Function-space Priorsmentioning
confidence: 99%
“…spatial distance) as GPs do not scale well with the number of samples for high-dimensional inputs [4,5]. To overcome this issue, Pearce et al [32] and Matsubara et al [27] developed BNN analogue for composite GPs. Similar to these studies, our ICK framework can also be viewed as a simulation for composite GPs.…”
Section: Nns With Prior Knowledgementioning
confidence: 99%
“…Related to our proposed methodology, Pearce et al [32] exploited the fact that a Bayesian neural network (BNN) approximates a GP to construct additive and multiplicative kernels, but they were limited to specific predefined kernels. Matsubara et al [27] then resolved this limitation by constructing priors of BNN parameters based on the ridgelet transform and its dual, but they did not explicitly show how their approach works for data with multiple sources of information. To our knowledge, none of these existing approaches allows a modeler to choose any appropriate kernel of known prior information from multiple sources.…”
Section: Introductionmentioning
confidence: 99%