2021
DOI: 10.48550/arxiv.2106.07533
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Posterior Temperature Optimization in Variational Inference for Inverse Problems

Abstract: Cold posteriors have been reported to perform better in practice in the context of Bayesian deep learning (Wenzel et al., 2020). In variational inference, it is common to employ only a partially tempered posterior by scaling the complexity term in the log-evidence lower bound (ELBO). In this work, we first derive the ELBO for a fully tempered posterior in meanfield variational inference and subsequently use Bayesian optimization to automatically find the optimal posterior temperature. Choosing an appropriate p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

1
0
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 2 publications
1
0
0
Order By: Relevance
“…We also explained why, in the absence of cooling, the KL term can dominate the data fitting term, typically leading to underfitting of the model, which in practice translates into poor results on all metrics considered. Our work therefore provides a well-grounded theoretical justification for the importance of using a partial tempering in the overparameterized framework, which completes the justifications given by Wenzel et al (2020); Izmailov et al (2021); Nabarro et al (2021); Noci et al (2021); Laves et al (2021). While our theoretical results apply to a neural network with a single hidden layer, we have shown numerically that similar conclusions can be drawn for more general NN architectures.…”
Section: Discussionsupporting
confidence: 85%
“…We also explained why, in the absence of cooling, the KL term can dominate the data fitting term, typically leading to underfitting of the model, which in practice translates into poor results on all metrics considered. Our work therefore provides a well-grounded theoretical justification for the importance of using a partial tempering in the overparameterized framework, which completes the justifications given by Wenzel et al (2020); Izmailov et al (2021); Nabarro et al (2021); Noci et al (2021); Laves et al (2021). While our theoretical results apply to a neural network with a single hidden layer, we have shown numerically that similar conclusions can be drawn for more general NN architectures.…”
Section: Discussionsupporting
confidence: 85%