2022
DOI: 10.1609/aaai.v36i6.20613
|View full text |Cite
|
Sign up to set email alerts
|

Up to 100x Faster Data-Free Knowledge Distillation

Abstract: Data-free knowledge distillation (DFKD) has recently been attracting increasing attention from research communities, attributed to its capability to compress a model only using synthetic data. Despite the encouraging results achieved, state-of-the-art DFKD methods still suffer from the inefficiency of data synthesis, making the data-free training process extremely time-consuming and thus inapplicable for large-scale tasks. In this work, we introduce an efficacious scheme, termed as FastDFKD, that allows us to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 38 publications
(10 citation statements)
references
References 20 publications
(52 reference statements)
1
8
0
Order By: Relevance
“…Adversarial DFKD (Micaelli and Storkey 2019) utilized adversarial learning to explore the data space more efficiently. Some follow-up work attempted to mitigate the catastrophic overfitting (Binici et al 2022b,a), mode collapse (Fang et al 2021) in DFKD, and to speed up the training process (Fang et al 2022). However, all of these methods necessitate white-box access to the teacher.…”
Section: Related Workmentioning
confidence: 99%
“…Adversarial DFKD (Micaelli and Storkey 2019) utilized adversarial learning to explore the data space more efficiently. Some follow-up work attempted to mitigate the catastrophic overfitting (Binici et al 2022b,a), mode collapse (Fang et al 2021) in DFKD, and to speed up the training process (Fang et al 2022). However, all of these methods necessitate white-box access to the teacher.…”
Section: Related Workmentioning
confidence: 99%
“…Fang et al [11] designed a data-free adversarial distillation framework, where the training samples were crafted by a generator with the intention of maximizing the teacher-student discrepancy. Since the generator in [11] took a long time to convergent, in [10], a meta-learning method was designed to accelerate the knowledge distillation process.…”
Section: Data-free Knowledge Distillationmentioning
confidence: 99%
“…Furthermore, the adversarial framework was extended by Choi et al [5] in the context of model quantization, by proposing adversarial data-free quantization (DFQ), and introducing additional regularization terms that match the mean and standard deviation of the generated pseudo-samples with the teacher model's batchnorm statistics, and imposes batch categorical entropy maximization, such that sample from each class appear equally in the generated batch. Fang et al recently introduced FastD-FKD [9], an effective method with a meta generator to speed up the DFKD process, delivering a 100-fold increase in the knowledge transfer rate.…”
Section: Related Workmentioning
confidence: 99%