2019
DOI: 10.1007/978-3-030-20890-5_30
|View full text |Cite
|
Sign up to set email alerts
|

Stochastic Normalizations as Bayesian Learning

Abstract: In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 7 publications
0
7
0
Order By: Relevance
“…We further investigate how batch size affects the training performance of batch normalized networks (Figure 1), from the perspective of a model's representational capacity. Several works [41,18,17] have shown that batch size is related to the magnitude of stochasticity [2,46] introduced by BN, which also affects the model's training performance. However, the stochasticity analysis [18] is specific to normalization along the batch dimension, and cannot explain why GN with a large group number has significantly worse performance (Figure 2), while our work provides a unified analysis for batch and group normalized networks.…”
Section: Discussion Of Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We further investigate how batch size affects the training performance of batch normalized networks (Figure 1), from the perspective of a model's representational capacity. Several works [41,18,17] have shown that batch size is related to the magnitude of stochasticity [2,46] introduced by BN, which also affects the model's training performance. However, the stochasticity analysis [18] is specific to normalization along the batch dimension, and cannot explain why GN with a large group number has significantly worse performance (Figure 2), while our work provides a unified analysis for batch and group normalized networks.…”
Section: Discussion Of Previous Workmentioning
confidence: 99%
“…BN standardizes the activations within a mini-batch of data, which improves the conditioning of optimization and accelerates training [20,3,40]. The stochasticity of normalization introduced along the batch dimension is believed to benefit generalization [51,41,18]. However, this stochasticity also results in differences between the training distribution (using mini-batch statistics) and the test distribution (using estimated population statistics) [19], which is believed to be the main cause of BN's smallbatch-size problem -BN's error increases rapidly as the batch size becomes smaller [51].…”
Section: Introductionmentioning
confidence: 99%
“…The proposed DJ approach is flexible procedure which is usable to a wide range of DL methods. Shekhovtsov et al [390] exploited the cause of enhanced the generalization performance of deep networks due to Batch Normalization (BN). They argued that randomness of batch statistics was one of the prime reasons.…”
Section: Other Uq Techniquesmentioning
confidence: 99%
“…One important property of BN is its ability to improve the generalization of DNNs. It is believed such an improvement is obtained from the stochasticity/noise introduced by normalization over batch data [8], [105], [205]. It is clear that both the normalized output (Eqn.17) and the population statistics (Eqn.18) can be viewed as stochastic variables, because they depend on the minibatch inputs, which are sampled over datasets.…”
Section: Stochasticity For Generalizationmentioning
confidence: 99%