2021
DOI: 10.48550/arxiv.2106.05586
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data augmentation in Bayesian neural networks and the cold posterior effect

Abstract: Data augmentation is a highly effective approach for improving performance in deep neural networks. The standard view is that it creates an enlarged dataset by adding synthetic data, which raises a problem when combining it with Bayesian inference: how much data are we really conditioning on? This question is particularly relevant to recent observations linking data augmentation to the cold posterior effect. We investigate various principled ways of finding a log-likelihood for augmented datasets. Our approach… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
28
2

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 10 publications
(40 citation statements)
references
References 29 publications
(56 reference statements)
2
28
2
Order By: Relevance
“…Other works in this area have suggested that unprincipled data augmentation could be a contributing factor to the cold posterior effect [e.g. 19,20]. We examine the effect of principled data augmentation and find that the cold posterior effect observed in our work reduces slightly with data augmentation.…”
Section: This Workmentioning
confidence: 47%
“…Other works in this area have suggested that unprincipled data augmentation could be a contributing factor to the cold posterior effect [e.g. 19,20]. We examine the effect of principled data augmentation and find that the cold posterior effect observed in our work reduces slightly with data augmentation.…”
Section: This Workmentioning
confidence: 47%
“…In the context of Bayesian neural networks, data augmentation conflicts with the likelihood and achieving good performance often requires to overcount the data, a trick commonly used in Bayesian deep learning (Zhang et al, 2018;Osawa et al, 2019). Nabarro et al (2021) instead propose to constrain functions with the augmentation distribution like van der Wilk et al ( 2018), but to address the cold-posterior effect (Wenzel et al, 2020;Fortuin et al, 2021). Our work uses the same formulation of the likelihood and additionally enables learning the parameters of the data augmentation.…”
Section: Related Workmentioning
confidence: 99%
“…This procedure can equivalently be seen as modifying the prior on functions (which is implied by the prior on weights) (van der Wilk et al, 2018) or as a modification to the likelihood (Nabarro et al, 2021). In either case, the marginal likelihood will now also depend on η because it influences the likelihood function p(y| f (x, θ, η), H).…”
Section: Parameterizing Invariancementioning
confidence: 99%
“…From that angle, Wenzel et al (2020) suggested that Gaussian priors might not be a good choice for Bayesian neural networks. In some works, data augmentation is argued to be the main reason for this effect (Izmailov et al, 2021;Nabarro et al, 2021) as the increased amount of observed data naturally leads to higher posterior contraction (Izmailov et al, 2021). At the same time, even considering the data augmentation for some models, the cold posterior effect is still present.…”
Section: Future Applicationsmentioning
confidence: 99%
“…In addition, Aitchison (2021) demonstrates that the problem might originate in the wrong likelihood of the models and that modifying only the likelihood based on data curation mitigates the cold posterior effect. Nabarro et al (2021) hypothesize that using an appropriate prior incorporating knowledge of data augmentation might provide a solution. Moreover, heavy-tailed priors have been shown to mitigate the cold posterior effect .…”
Section: Future Applicationsmentioning
confidence: 99%