Uncertainty Estimation via Stochastic Batch Normalization

Atanov, Andrei; Ashukha, Arsenii; Molchanov, Dmitry; Neklyudov, Kirill; Vetrov, Dmitry

doi:10.1007/978-3-030-22796-8_28

Cited by 33 publications

(34 citation statements)

References 4 publications

(7 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We denote the normalization operation as F (·) and the normalized output asx = F (X B ; x). For a certain x, X B can be viewed as a random variable [2,46].x is thus a random variable which shows the stochasticity. It's interesting to explore the statistical momentum of x to measure the magnitude of the stochasticity.…”

Section: Stochastic Normalization Disturbancementioning

confidence: 99%

Iterative Normalization: Beyond Standardization Towards Efficient Whitening

Huang

Zhou

Zhu

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

133

View full text Add to dashboard Cite

Batch Normalization (BN) is ubiquitously employed for accelerating neural network training and improving the generalization capability by performing standardization within mini-batches. Decorrelated Batch Normalization (DBN) further boosts the above effectiveness by whitening. However, DBN relies heavily on either a large batch size, or eigendecomposition that suffers from poor efficiency on GPUs. We propose Iterative Normalization (IterNorm), which employs Newtons iterations for much more efficient whitening, while simultaneously avoiding the eigen-decomposition. Furthermore, we develop a comprehensive study to show IterNorm has better trade-off between optimization and generalization, with theoretical and experimental support. To this end, we exclusively introduce Stochastic Normalization Disturbance (SND), which measures the inherent stochastic uncertainty of samples when applied to normalization operations. With the support of SND, we provide natural explanations to several phenomena from the perspective of optimization, e.g., why group-wise whitening of DBN generally outperforms full-whitening and why the accuracy of BN degenerates with reduced batch sizes. We demonstrate the consistently improved performance of IterNorm with extensive experiments on CIFAR-10 and ImageNet over BN and DBN.

show abstract

Section: Stochastic Normalization Disturbancementioning

confidence: 99%

Iterative Normalization: Beyond Standardization Towards Efficient Whitening

Huang

Zhou

Zhu

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

133

View full text Add to dashboard Cite

show abstract

“…Another work (Atanov et al 2018) interpret the mean and variance of mini-batch statistics used in batch normalization (Ioffe and Szegedy 2015) as random variables since they depend on stochastic shuffling of training examples into mini-batches during training. Thus, the neural network with batch normalization layers can be viewed as a probabilistic model during training.…”

Section: Related Workmentioning

confidence: 99%

“…We apply stochastic batch normalization (Atanov et al 2018) to obtain uncertainty estimation on a deep neural network trained for detecting diabetic retinopathy. Unlike previous works that demonstrate out-of-dataset detection by artificially splitting a dataset by classes (CIFAR5) or generating new images by rotation (notMINIST), we observe that domain shift in real-world dataset is more subtle.…”

Section: Introductionmentioning

confidence: 99%

Building Trust in Deep Learning System towards Automated Disease Detection

Lim

Lee

Hsu

et al. 2019

AAAI

View full text Add to dashboard Cite

Though deep learning systems have achieved high accuracy in detecting diseases from medical images, few such systems have been deployed in highly automated disease screening settings due to lack of trust in how well these systems can generalize to out-of-datasets. We propose to use uncertainty estimates of the deep learning system’s prediction to know when to accept or to disregard its prediction. We evaluate the effectiveness of using such estimates in a real-life application for the screening of diabetic retinopathy. We also generate visual explanation of the deep learning system to convey the pixels in the image that influences its decision. Together, these reveal the deep learning system’s competency and limits to the human, and in turn the human can know when to trust the deep learning system.

show abstract

“…Concurrently to this work, the authors of [2] have proposed a similar model for BN stochasticity and demonstrated that the distributions of U and V can be used at test time for improving the test data likelihoods and out-of-domain uncertainties. However, they did not explore using this model during the learning.…”

Section: Model Of Bn Stochasticitymentioning

confidence: 96%

“…There are several closely related works concurrent with this submission [20,25,2,15]. Work [20] argues that BN improves generalization because it leads to a smoother objective function, the authors of [15] study the question why BN is often found incompatible with dropout, and works [25,2] observe that randomness in batch normalization can be linked to optimizing a lower bound on the expected data likelihood [2] and to variational Bayesian learning [25]. However, these works focus on estimating the uncertainty of outputs in models that have been already trained using BN.…”

Section: Related Workmentioning

confidence: 99%

Stochastic Normalizations as Bayesian Learning

Shekhovtsov

Flach

2019

Computer Vision – ACCV 2018

View full text Add to dashboard Cite

In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their generalization performance can be improved significantly by Bayesian learning of the same form. We obtain test performance comparable to BN and, at the same time, better validation losses suitable for subsequent output uncertainty estimation through approximate Bayesian posterior.Recent advances in hardware and deep NNs make it possible to use large capacity networks, so that the training accuracy becomes close to 100% even for rather difficult tasks. At the same time, however, we would like to ensure small generalization gaps, i.e. a high validation accuracy and a reliable confidence prediction. For this reason, regularization methods become very important.As the base model for this study we have chosen the All-CNN network of [23], a network with eight convolutional layers, and train it on the CIFAR-10 dataset. Recent work [7] compares different regularization techniques with this network and reports test accuracy of 91.87% with their probabilistic network and 90.88% with dropout but omits BN. Fig. 1 shows how well BN generalizes for this problem when applied to exactly the same network. It easily achieves validation accuracy 93%, being significantly better than the dedicated regularization techniques proposed in [7]. It appears that BN is a very powerful regularization method. The goal of this work is to try to understand and exploit the respective mechanism. Towards this end we identify two components: one is a non-linear reparametrization of the model that preconditions gradient descent and the other is stochasticity.The reparametrization may be as well achieved by other normalization techniques such as weight normalization [19] and analytic normalization [22] amongst others [14,1]. The advantage of these methods is that they are deterministic and thus do not rely on batch statistics, often require less computation overhead, are continuously differentiable [22] and can be applied more flexibly, e.g. to cases with a small batch size or recurrent neural networks. Unfortunately, these methods, while improving on the training loss, do not generalize as good as BN, which was observed experimentally in [8,22]. We therefore look at further aspects of BN that could explain its regularization.

show abstract

Uncertainty Estimation via Stochastic Batch Normalization

Cited by 33 publications

References 4 publications

Iterative Normalization: Beyond Standardization Towards Efficient Whitening

Iterative Normalization: Beyond Standardization Towards Efficient Whitening

Building Trust in Deep Learning System towards Automated Disease Detection

Stochastic Normalizations as Bayesian Learning

Contact Info

Product

Resources

About