Rethinking Normalization and Elimination Singularity in Neural Networks

Qiao, Siyuan; Wang, Huiyu; Liu, Chenxi; Shen, Wei; Yuille, Alan

doi:10.48550/arxiv.1911.09738

Cited by 3 publications

(3 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Huang et al showed that CWN combined with BN can improve the original networks with only BN. The idea of combining normalizing weights and activations to improve performance has been widely studied [24], [52], [141], [171]. Moreover, Luo et al proposed cosine normalization [172], which merges layer normalization and weight normalization together.…”

Section: Combining Activation Normalizationmentioning

confidence: 99%

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

Huang¹,

Qin²,

Zhou³

et al. 2020

Preprint

View full text Add to dashboard Cite

Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the perspective of optimization, and present a taxonomy for understanding the similarities and differences between them. Specifically, we decompose the pipeline of the most representative normalizing activation methods into three components: the normalization area partitioning, normalization operation and normalization representation recovery. In doing so, we provide insight for designing new normalization technique. Finally, we discuss the current progress in understanding normalization methods, and provide a comprehensive review of the applications of normalization for particular tasks, in which it can effectively solve the key issues.

show abstract

Section: Combining Activation Normalizationmentioning

confidence: 99%

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

Huang¹,

Qin²,

Zhou³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…DeepLabv3+ is composed of an encoder network and decoder network; in the first version, we change the decoder by replacing all the convolutions with our new version of LP-BNN convolutions and leave the encoder unchanged. In the second variant we use weight standardization [42] on the convolutional layers of the decoder, replacing batch normalization [22] in the decoder with group normalization [52]. We denote the first version LP-BNN and the second one LP-BNN + GN.…”

Section: Semantic Segmentationmentioning

confidence: 99%

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification

Franchi,

Bursuc,

Aldea

et al. 2020

Preprint

View full text Add to dashboard Cite

Bayesian neural networks (BNNs) have been long considered an ideal, yet unscalable solution for improving the robustness and the predictive uncertainty of deep neural networks. While they could capture more accurately the posterior distribution of the network parameters, most BNN approaches are either limited to small networks or rely on constraining assumptions such as parameter independence. These drawbacks have enabled prominence of simple, but computationally heavy approaches such as Deep Ensembles, whose training and testing costs increase linearly with the number of networks. In this work we aim for efficient deep BNNs amenable to complex computer vision architectures, e.g. ResNet50 DeepLabV3+, and tasks, e.g. semantic segmentation, with fewer assumptions on the parameters. We achieve this by leveraging variational autoencoders (VAEs) to learn the interaction and the latent distribution of the parameters at each network layer. Our approach, Latent-Posterior BNN (LP-BNN), is compatible with the recent BatchEnsemble method, leading to highly efficient (in terms of computation and memory during both training and testing) ensembles. LP-BNNs attain competitive results across multiple metrics in several challenging benchmarks for image classification, semantic segmentation and out-of-distribution detection.

show abstract

“…Du et al (2018) showed that for GD over one-hidden-layer weight normalized CNN, with a constant probability over initialization, iterates converge to global minima. Qiao et al (2019) compared different normalization techniques from the perspective of whether they lead to points, where neurons are consistently deactivated. Wu et al (2019) established the inductive bias of gradient flow with weight normalization for overparameterized least squares and showed that for a wider range of initializations as compared to normal parameterization, it converges to the minimum L 2 norm solution.…”

Section: Related Workmentioning

confidence: 99%

Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets

Morwani¹,

Ramaswamy²

2020

Preprint

View full text Add to dashboard Cite

We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. Our analysis focuses on exponential weight normalization (EWN), which encourages weight updates along the radial direction. This paper shows that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate, and hence causes the weights to be updated in a way that prefers asymptotic relative sparsity. These results can be extended to hold for gradient descent via an appropriate adaptive learning rate. The asymptotic convergence rate of the loss in this setting is given by Θ( 1t(log t) 2 ), and is independent of the depth of the network. We contrast these results with the inductive bias of standard weight normalization (SWN) and unnormalized architectures, and demonstrate their implications on synthetic data sets. Experimental results on simple data sets and architectures support our claim on sparse EWN solutions, even with SGD. This demonstrates its potential applications in learning prunable neural networks.

show abstract

Rethinking Normalization and Elimination Singularity in Neural Networks

Cited by 3 publications

References 27 publications

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification

Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets

Contact Info

Product

Resources

About