Iterative Normalization: Beyond Standardization Towards Efficient Whitening

Huang, Lei; Zhou, Yi; Zhu, Fan; Liu, Li; Shao, Ling

doi:10.1109/cvpr.2019.00501

Cited by 93 publications

(149 citation statements)

References 24 publications

Supporting

Mentioning

148

Contrasting

Order By: Relevance

“…The whitening procedure, a.k.a decorrelated batch normalization, does not only standardize the feature but also eliminates the data correlation. The decorrelated batch normalization can improve both the optimization efficiency and generalization ability of deep neural networks (Huang et al, 2018;Siarohin et al, 2018;Huang et al, 2019;Pan et al, 2019;Huang et al, 2020;Ermolov et al, 2021).…”

Section: Applicationsmentioning

confidence: 99%

“…Table 1 summarizes the forward computational complexity. As suggested in Li et al (2018); Huang et al (2019), the iteration times for NS iteration are often set as 5 such that reasonable performances can be achieved. That is, to consume the same complexity as the NS iteration does, our MTP and MPA can match to the power series up to degree 16.…”

Section: Matrix Pad é Approximantsmentioning

confidence: 99%

“…Compared with SVD, the NS iteration is rich in matrix multiplication and more GPU-friendly. Thus, this technique has been widely used to approximate the matrix square root in different applications (Lin & Maji, 2017;Li et al, 2018;Huang et al, 2019). The forward computation relies on the following coupled iterations:…”

Section: Introductionmentioning

confidence: 99%

“…These two steps take 4 matrix multiplications in total. Consider the fact that the NS iteration often takes 5 iterations to achieve reasonable performances (Li et al, 2018;Huang et al, 2019). The backward pass is much more time-costing than the backward algorithm of SVD.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Fast Differentiable Matrix Square Root

Song¹,

Sebe²,

Wang³

2022

Preprint

View full text Add to dashboard Cite

Computing the matrix square root or its inverse in a differentiable manner is important in a variety of computer vision tasks. Previous methods either adopt the Singular Value Decomposition (SVD) to explicitly factorize the matrix or use the Newton-Schulz iteration (NS iteration) to derive the approximate solution. However, both methods are not computationally efficient enough in either the forward pass or in the backward pass. In this paper, we propose two more efficient variants to compute the differentiable matrix square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Padé Approximants (MPA). The backward gradient is computed by iteratively solving the continuous-time Lyapunov equation using the matrix sign function. Both methods yield considerable speed-up compared with the SVD or the Newton-Schulz iteration. Experimental results on the de-correlated batch normalization and second-order vision transformer demonstrate that our methods can also achieve competitive and even slightly better performances. The code is available at https://github.com/KingJamesSong/FastDifferentiableMatSqrt.

show abstract

Section: Applicationsmentioning

confidence: 99%

Section: Matrix Pad é Approximantsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Fast Differentiable Matrix Square Root

Song¹,

Sebe²,

Wang³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…There exists extensive study on optimizing effectiveness via tuning batch size. On the one hand, a small batch size will lead to a high variance of statistics and weaken the training stability (Wu and He 2018;Huang et al 2019a). On the other hand, a large batch size can reduce the estimation noise but it will incur a sharp landscape of loss (Keskar et al 2016) making the optimization problem more challenging.…”

Section: Introductionmentioning

confidence: 99%

Instance Enhancement Batch Normalization: An Adaptive Regulator of Batch Noise

Liang

Zhongzhan

Liang³

et al. 2020

AAAI

View full text Add to dashboard Cite

Batch Normalization (BN) (Ioffe and Szegedy 2015) normalizes the features of an input image via statistics of a batch of images and hence BN will bring the noise to the gradient of training loss. Previous works indicate that the noise is important for the optimization and generalization of deep neural networks, but too much noise will harm the performance of networks. In our paper, we offer a new point of view that the self-attention mechanism can help to regulate the noise by enhancing instance-specific information to obtain a better regularization effect. Therefore, we propose an attention-based BN called Instance Enhancement Batch Normalization (IEBN) that recalibrates the information of each channel by a simple linear transformation. IEBN has a good capacity of regulating the batch noise and stabilizing network training to improve generalization even in the presence of two kinds of noise attacks during training. Finally, IEBN outperforms BN with only a light parameter increment in image classification tasks under different network structures and benchmark datasets.

show abstract

Native and Non‐Native English Speaking Teachers in China: Perceptions and PracticesZhengHuang. Shanghai, China: Shanghai Jiao Tong University Press and Singapore: Springer, 2018. Pp. xv + 192.

2018

TESOL Quarterly

View full text Add to dashboard Cite

In this paper, we consider the problem of disease diagnosis. Unlike the conventional learning paradigm that treats labels independently, we propose a knowledge-enhanced framework, that enables training visual representation with the guidance of medical domain knowledge. In particular, we make the following contributions: First, to explicitly incorporate experts' knowledge, we propose to learn a neural representation of medical knowledge graph via contrastive learning, implicitly establishing relations between different medical concepts. Second, while training the visual encoder, we keep the parameters of the knowledge encoder frozen and propose to learn a set of prompt vectors for efficient adaptation. Third, we adopt a Transformer-based disease-query module for cross-model fusion, which naturally enables explainable diagnosis results via cross attention. To validate the effectiveness of our proposed framework, we conduct thorough experiments on three x-ray imaging datasets across different anatomy structures, showing our model can exploit the implicit relations between diseases/findings, thus is beneficial to the commonly encountered problem in the medical domain, namely, long-tailed and zero-shot recognition, which conventional methods either struggle or completely fail to realize.

show abstract

Iterative Normalization: Beyond Standardization Towards Efficient Whitening

Cited by 93 publications

References 24 publications

Fast Differentiable Matrix Square Root

Fast Differentiable Matrix Square Root

Instance Enhancement Batch Normalization: An Adaptive Regulator of Batch Noise

Native and Non‐Native English Speaking Teachers in China: Perceptions and PracticesZhengHuang. Shanghai, China: Shanghai Jiao Tong University Press and Singapore: Springer, 2018. Pp. xv + 192.

Contact Info

Product

Resources

About