Adversarially Robust Distillation

Goldblum, Micah; Fowl, Liam; Feizi, Soheil; Goldstein, Tom

doi:10.1609/aaai.v34i04.5816

Cited by 105 publications

(122 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One effective way to train adversarially robust model is adversarial training (Madry et al, 2017;Zhang et al, 2019;Engstrom et al, 2019), which adds adversarial perturbations to the inputs during training and forces the model to learn robust predictions. Goldblum et al (2020) follows the same idea and formulates an adversarially robust distillation (ARD) objective using adversarial training:…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

How and When Adversarial Robustness Transfers in Knowledge Distillation?

Shao¹,

Yi²,

Chen³

et al. 2021

Preprint

View full text Add to dashboard Cite

Knowledge distillation (KD) has been widely used in teacher-student training, with applications to model compression in resource-constrained deep learning. Current works mainly focus on preserving the accuracy of the teacher model. However, other important model properties, such as adversarial robustness, can be lost during distillation. This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in KD. We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy. Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model. Our experiments of KD contain a diverse set of teacher and student models with varying network architectures and sizes evaluated on ImageNet and CIFAR-10 datasets, including residual neural networks (ResNets) and vision transformers (ViTs). Our comprehensive analysis shows several novel insights that (1) With KDIGA, students can preserve or even exceed the adversarial robustness of the teacher model, even when their models have fundamentally different architectures; (2) KDIGA enables robustness to transfer to pre-trained students, such as KD from an adversarially trained ResNet to a pre-trained ViT, without loss of clean accuracy; and(3) Our derived local linearity bounds for characterizing adversarial robustness in KD are consistent with the empirical results.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Besides, we show two ways to combine our method with adversarial training strategies for KD using ARD (Goldblum et al, 2020), i.e., KDIGA-ARD C and KDIGA-ARD A . The objective for them are arg min…”

Section: Problem Formulationmentioning

confidence: 99%

How and When Adversarial Robustness Transfers in Knowledge Distillation?

Shao¹,

Yi²,

Chen³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Papernot et al [24] introduced defensive distillation, using the knowledge extracted from the original DNN to reduce the effectiveness of adversarial examples. Goldblum et al [11] distilled robustness onto student networks by encouraging them to mimic the output of the teacher within an ϵ-ball of training instances.…”

Section: Related Work 51 Knowledge Distillationmentioning

confidence: 99%

Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge Distillation

Wang

Zheng

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Motivated by resource-limited scenarios, knowledge distillation (KD) has received growing attention, effectively and quickly producing lightweight yet high-performance student models by transferring the dark knowledge from large teacher models. However, many pre-trained teacher models are downloaded from public platforms that lack necessary vetting, posing a possible threat to knowledge distillation tasks. Unfortunately, thus far, there has been little research to consider the backdoor attack from the teacher model into student models in KD, which may pose a severe threat to its wide use. In this paper, we, for the first time, propose a novel Anti-Distillation Backdoor Attack (ADBA), in which the backdoor embedded in the public teacher model can survive the knowledge distillation process and thus be transferred to secret distilled student models. We first introduce a shadow to imitate the distillation process and adopt an optimizable trigger to transfer information to help craft the desired teacher model. Our attack is powerful and effective, which achieves 95.92%, 94.79%, and 90.19% average success rates of attacks (SRoAs) against several different structure student models on MNIST, CIFAR-10, and GTSRB, respectively. Our

show abstract

“…Friendly Adversarial Training (FAT) [31], Misclassification Aware adveRsarial Training (MART) [18], Robust Self-Training (RST) [23], Unsupervised Adversarial Training (UAT) [32], Guided Adversarial Training (GAT) [33], Max-Margin AT [34], using Max-Mahalanobis Center (MMC) loss [35], accelerated AT [36][37][38], using pre-training [39], incorporating hypersphere embedding [40], self-progressing robust training [41], Adversarial Weight Perturbation (AWP) [19], Adversarial Distributional Training (ADT) [42], Channel-wise Activation Suppressing (CAS) [21], Geometry-Aware Instance-Reweighted Adversarial Training (GAIRAT) [43] and robustness distillation [44,45].…”

Section: Adversarial Trainingmentioning

confidence: 99%

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Huang

Wang

Erfani

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks. A range of defense methods have been proposed to train adversarially robust DNNs, among which adversarial training has demonstrated promising results. However, despite preliminary understandings developed for adversarial training, it is still not clear, from the architectural perspective, what configurations can lead to more robust DNNs. In this paper, we address this gap via a comprehensive investigation on the impact of network width and depth on the robustness of adversarially trained DNNs. Specifically, we make the following key observations: 1) more parameters (higher model capacity) does not necessarily help adversarial robustness; 2) reducing capacity at the last stage (the last group of blocks) of the network can actually improve adversarial robustness; and 3) under the same parameter budget, there exists an optimal architectural configuration for adversarial robustness. We also provide a theoretical analysis explaning why such network configuration can help robustness. These architectural insights can help design adversarially robust DNNs. Code is available at https://github.com/HanxunH/RobustWRN.

show abstract

Adversarially Robust Distillation

Cited by 105 publications

References 20 publications

How and When Adversarial Robustness Transfers in Knowledge Distillation?

How and When Adversarial Robustness Transfers in Knowledge Distillation?

Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge Distillation

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Contact Info

Product

Resources

About