Understanding Batch Normalization

Björck, Johan; Gomes, Carla P.; Selman, Bart; Weinberger, Kilian Q.

doi:10.48550/arxiv.1806.02375

Cited by 25 publications

(25 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Normalization layer An initial motivation of devising Batch Normalization (BN) (Ioffe & Szegedy, 2015) was trying to solve internal covariate shift. However, their belief in internal covariate shift was broken by follow-up studies, and the benefits from BN in terms of training perspective were analyzed in various directions (Bjorck et al, 2018;Santurkar et al, 2018). After that, several normalization layers devised for various computer vision tasks have been proposed with their respective advantage (Ulyanov et al, 2016;Wu & He, 2018;Hoffer et al, 2017).…”

Section: Related Workmentioning

confidence: 99%

Task-Balanced Batch Normalization for Exemplar-based Class-Incremental Learning

Cha¹,

Cho²,

Hwang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Batch Normalization (BN) is an essential layer for training neural network models in various computer vision tasks. It has been widely used in continual learning scenarios with little discussion, but we find that BN should be carefully applied, particularly for the exemplar memory based class incremental learning (CIL). We first analyze that the empirical mean and variance obtained for normalization in a BN layer become highly biased toward the current task. To tackle its significant problems in training and test phases, we propose Task-Balanced Batch Normalization (TBBN). Given each mini-batch imbalanced between the current and previous tasks, TBBN first reshapes and repeats the batch, calculating near task-balanced mean and variance. Second, we show that when the affine transformation parameters of BN are learned from a reshaped feature map, they become less-biased toward the current task. Based on our extensive CIL experiments with CIFAR-100 and ImageNet-100 datasets, we demonstrate that our TBBN is easily applicable to most of existing exemplar-based CIL algorithms, improving their performance by decreasing the forgetting on the previous tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Task-Balanced Batch Normalization for Exemplar-based Class-Incremental Learning

Cha¹,

Cho²,

Hwang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Normalization methods such as batch normalization [10] and its variants [19,20] are essential building blocks to train deep neural network models [21]. Empirically, one of the benefits of batch normalization include stable training dynamics even with large learning rates [22], a key behind the efficiency of modern deep learning. So, what is the mechanism behind this stable training dynamics enabled by normalization layers?…”

Section: Kinetic Symmetry Breaking Induces Adaptive Optimizationmentioning

confidence: 99%

Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks

Tanaka¹,

Kunin

2021

Preprint

View full text Add to dashboard Cite

In nature, symmetry governs regularities, while symmetry breaking brings texture.Here, we reveal a novel role of symmetry breaking behind efficiency and stability in learning, a critical issue in machine learning. Recent experiments suggest that the symmetry of the loss function is closely related to the learning performance. This raises a fundamental question. Is such symmetry beneficial, harmful, or irrelevant to the success of learning? Here, we demystify this question and pose symmetry breaking as a new design principle by considering the symmetry of the learning rule in addition to the loss function. We model the discrete learning dynamics using a continuous-time Lagrangian formulation, in which the learning rule corresponds to the kinetic energy and the loss function corresponds to the potential energy. We identify kinetic asymmetry unique to learning systems, where the kinetic energy often does not have the same symmetry as the potential (loss) function reflecting the non-physical symmetries of the loss function and the non-Euclidean metric used in learning rules. We generalize Noether's theorem known in physics to explicitly take into account this kinetic asymmetry and derive the resulting motion of the Noether charge. Finally, we apply our theory to modern deep networks with normalization layers and reveal a mechanism of implicit adaptive optimization induced by the kinetic symmetry breaking.

show abstract

“…We are using the architecture inspired by ResNet [13, 14] and optimised for our specific task. The network consists of basic building blocks like convolution layers, SWISH activation functions [15], batch normalisation (BN) [16], maxpooling and dropout layers. The architecture is made of convolutions and groups of residual skip connections (identity shortcuts and convolution shortcuts as defined in [14]).…”

Section: Convolutional Neural Networkmentioning

confidence: 99%

Deep Learning for direct Dark Matter search with nuclear emulsions

Golovatiuk,

Ustyuzhanin,

Alexandrov

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose a new method for discrimination of sub-micron nuclear recoil tracks from an instrumental background in fine-grain nuclear emulsions used in the directional dark matter search. The proposed method is based on the Deep Learning approach and uses a 3D Convolutional Neural Network architecture with parameters optimised by Bayesian search. Unlike previous studies focused on extracting the directional information, we focus on the signal/background separation exploiting the polarisation dependence of the Localised Surface Plasmon Resonance phenomenon. Comparing the proposed method with the conventional cut-based approach shows a significant boost in the rejection power while keeping the signal efficiency at the same level.

show abstract

Understanding Batch Normalization

Cited by 25 publications

References 0 publications

Task-Balanced Batch Normalization for Exemplar-based Class-Incremental Learning

Task-Balanced Batch Normalization for Exemplar-based Class-Incremental Learning

Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks

Deep Learning for direct Dark Matter search with nuclear emulsions

Contact Info

Product

Resources

About