Batch Normalization is a Cause of Adversarial Vulnerability

Galloway, Angus; Голубева, А. В.; Tanay, Thomas; Moussa, Medhat; Taylor, Graham W.

doi:10.48550/arxiv.1905.02161

Cited by 19 publications

(24 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also define the DENSE set, since networks with many operations per cell and complex connectivity are underexplored in the literature despite their potential [27]. Next, we define the BN-FREE set that is of interest due to BN's potential negative side-effects [52,53] and the difficulty or unnecessity of using it in some cases [54-56, 36, 37]. We finally add the RESNET/VIT set with two predefined image classification architectures: commonlyused ResNet-50 [8] and a smaller 12-layer version of the Visual Transformer (ViT) [20] that has recently received a lot of attention in the vision community.…”

Section: Deepnets-1mmentioning

confidence: 99%

Parameter Prediction for Unseen Deep Architectures

Knyazev¹,

Drozdzal²,

Taylor³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures -DEEPNETS-1M-and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.

show abstract

Section: Deepnets-1mmentioning

confidence: 99%

Parameter Prediction for Unseen Deep Architectures

Knyazev¹,

Drozdzal²,

Taylor³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [19], it is argued that ReLU activated neural networks would always have open decision boundaries which leave the risk of high responses for unseen OOD samples. In another paper, it is argued that batch normalization is also a cause of the adversarial vulnerability [6]. Such network vulnerability are hard to be reflected by the clean medical image benchmark datasets.…”

Section: Introductionmentioning

confidence: 99%

No Surprises: Training Robust Lung Nodule Detection for Low-Dose CT Scans by Augmenting with Adversarial Attacks

Liu

Setio

Ghesu

et al. 2020

Preprint

View full text Add to dashboard Cite

Detecting malignant pulmonary nodules at an early stage can allow medical interventions which increases the survival rate of lung cancer patients. Using computer vision techniques to detect nodules can improve the sensitivity and the speed of interpreting chest CT for lung cancer screening. Many studies have used CNNs to detect nodule candidates. Though such approaches have been shown to outperform the conventional image processing based methods regarding the detection accuracy, CNNs are also known to be limited to generalize on under-represented samples in the training set and prone to imperceptible noise perturbations. Such limitations can not be easily addressed by scaling up the dataset or the models. In this work, we propose to add adversarial synthetic nodules and adversarial attack samples to the training data to improve the generalization and the robustness of the lung nodule detection systems. In order to generate hard examples of nodules from a differentiable nodule synthesizer, we use projected gradient descent (PGD) to search the latent code within a bounded neighbourhood that would generate nodules to decrease the detector response. To make the network more robust to unanticipated noise perturbations, we use PGD to search for noise patterns that can trigger the network to give over-confident mistakes. By evaluating on two different benchmark datasets containing consensus annotations from three radiologists, we show that the proposed techniques can improve the detection performance on real CT data. To understand the limitations of both the conventional networks and the proposed augmented networks, we also perform stress-tests on the false positive reduction networks by feeding different types of artificially produced patches. We show that the augmented networks are more robust to both under-represented nodules as well as resistant to noise perturbations.

show abstract

“…For AFF, following the settings in[27], we also use 10-step ∞ PGD attack with = 8/255 to generate adversarial perturbations during training, and train the entire network parameters f θ and the linear classifier with trades loss for 25 epochs with initial learning rate of 0.1 which decreases to 0.1× at epoch 15, 20. We report the AA, RA and SA for the best possible model for every method under every setting.TRIBN: Customized batch normalization It has recently been shown in[27,62,63] that batch normalization (BN) could play a vital role in robust training with 'mixed' normal and adversarial data. Thus, a careful study on the BN strategy of ADVCL is needed, since two types of adversarial perturbations are generated in Eq.…”

mentioning

confidence: 99%

“…Besides, we use the other BN for normally transformed data, i.e., (τ 1 (x), τ 2 (x), x h ). Compared with existing work[27,62,63] that used 2 BNs (one for adversarial data and the other for benign data), our proposed ADVCL calls for triple BNs (TRIBN).…”

mentioning

confidence: 99%

When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?

Fan¹,

Liu²,

Chen³

et al. 2021

Preprint

View full text Add to dashboard Cite

Contrastive learning (CL) can learn generalizable feature representations and achieve state-of-the-art performance of downstream tasks by finetuning a linear classifier on top of it. However, as adversarial robustness becomes vital in image classification, it remains unclear whether or not CL is able to preserve robustness to downstream tasks. The main challenge is that in the 'self-supervised pretraining + supervised finetuning' paradigm, adversarial robustness is easily forgotten due to a learning task mismatch from pretraining to finetuning. We call such challenge 'cross-task robustness transferability'. To address the above problem, in this paper we revisit and advance CL principles through the lens of robustness enhancement. We show that (1) the design of contrastive views matters: High-frequency components of images are beneficial to improving model robustness; (2) Augmenting CL with pseudo-supervision stimulus (e.g., resorting to feature clustering) helps preserve robustness without forgetting. Equipped with our new designs, we propose ADVCL, a novel adversarial contrastive pretraining framework. We show that ADVCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency. With a thorough experimental study, we demonstrate that ADVCL outperforms the state-of-the-art self-supervised robust learning methods across multiple datasets (CIFAR-10, CIFAR-100 and STL-10) and finetuning schemes (linear evaluation and full model finetuning). Code is available at https://github.com/LijieFan/AdvCL.

show abstract

Batch Normalization is a Cause of Adversarial Vulnerability

Cited by 19 publications

References 11 publications

Parameter Prediction for Unseen Deep Architectures

Parameter Prediction for Unseen Deep Architectures

No Surprises: Training Robust Lung Nodule Detection for Low-Dose CT Scans by Augmenting with Adversarial Attacks

When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?

Contact Info

Product

Resources

About