2020
DOI: 10.1609/aaai.v34i04.5816
|View full text |Cite
|
Sign up to set email alerts
|

Adversarially Robust Distillation

Abstract: Knowledge distillation is effective for producing small, high-performance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto student networks. In a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
93
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 105 publications
(122 citation statements)
references
References 20 publications
0
93
0
Order By: Relevance
“…One effective way to train adversarially robust model is adversarial training (Madry et al, 2017;Zhang et al, 2019;Engstrom et al, 2019), which adds adversarial perturbations to the inputs during training and forces the model to learn robust predictions. Goldblum et al (2020) follows the same idea and formulates an adversarially robust distillation (ARD) objective using adversarial training:…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…One effective way to train adversarially robust model is adversarial training (Madry et al, 2017;Zhang et al, 2019;Engstrom et al, 2019), which adds adversarial perturbations to the inputs during training and forces the model to learn robust predictions. Goldblum et al (2020) follows the same idea and formulates an adversarially robust distillation (ARD) objective using adversarial training:…”
Section: Related Workmentioning
confidence: 99%
“…Besides, we show two ways to combine our method with adversarial training strategies for KD using ARD (Goldblum et al, 2020), i.e., KDIGA-ARD C and KDIGA-ARD A . The objective for them are arg min…”
Section: Problem Formulationmentioning
confidence: 99%
“…Papernot et al [24] introduced defensive distillation, using the knowledge extracted from the original DNN to reduce the effectiveness of adversarial examples. Goldblum et al [11] distilled robustness onto student networks by encouraging them to mimic the output of the teacher within an ϵ-ball of training instances.…”
Section: Related Work 51 Knowledge Distillationmentioning
confidence: 99%
“…Friendly Adversarial Training (FAT) [31], Misclassification Aware adveRsarial Training (MART) [18], Robust Self-Training (RST) [23], Unsupervised Adversarial Training (UAT) [32], Guided Adversarial Training (GAT) [33], Max-Margin AT [34], using Max-Mahalanobis Center (MMC) loss [35], accelerated AT [36][37][38], using pre-training [39], incorporating hypersphere embedding [40], self-progressing robust training [41], Adversarial Weight Perturbation (AWP) [19], Adversarial Distributional Training (ADT) [42], Channel-wise Activation Suppressing (CAS) [21], Geometry-Aware Instance-Reweighted Adversarial Training (GAIRAT) [43] and robustness distillation [44,45].…”
Section: Adversarial Trainingmentioning
confidence: 99%