2018
DOI: 10.48550/arxiv.1806.02375
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Understanding Batch Normalization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 25 publications
(25 citation statements)
references
References 0 publications
0
25
0
Order By: Relevance
“…Normalization layer An initial motivation of devising Batch Normalization (BN) (Ioffe & Szegedy, 2015) was trying to solve internal covariate shift. However, their belief in internal covariate shift was broken by follow-up studies, and the benefits from BN in terms of training perspective were analyzed in various directions (Bjorck et al, 2018;Santurkar et al, 2018). After that, several normalization layers devised for various computer vision tasks have been proposed with their respective advantage (Ulyanov et al, 2016;Wu & He, 2018;Hoffer et al, 2017).…”
Section: Related Workmentioning
confidence: 99%
“…Normalization layer An initial motivation of devising Batch Normalization (BN) (Ioffe & Szegedy, 2015) was trying to solve internal covariate shift. However, their belief in internal covariate shift was broken by follow-up studies, and the benefits from BN in terms of training perspective were analyzed in various directions (Bjorck et al, 2018;Santurkar et al, 2018). After that, several normalization layers devised for various computer vision tasks have been proposed with their respective advantage (Ulyanov et al, 2016;Wu & He, 2018;Hoffer et al, 2017).…”
Section: Related Workmentioning
confidence: 99%
“…Normalization methods such as batch normalization [10] and its variants [19,20] are essential building blocks to train deep neural network models [21]. Empirically, one of the benefits of batch normalization include stable training dynamics even with large learning rates [22], a key behind the efficiency of modern deep learning. So, what is the mechanism behind this stable training dynamics enabled by normalization layers?…”
Section: Kinetic Symmetry Breaking Induces Adaptive Optimizationmentioning
confidence: 99%
“…We are using the architecture inspired by ResNet [13, 14] and optimised for our specific task. The network consists of basic building blocks like convolution layers, SWISH activation functions [15], batch normalisation (BN) [16], maxpooling and dropout layers. The architecture is made of convolutions and groups of residual skip connections (identity shortcuts and convolution shortcuts as defined in [14]).…”
Section: Convolutional Neural Networkmentioning
confidence: 99%