2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00361
|View full text |Cite
|
Sign up to set email alerts
|

Data-Free Learning of Student Networks

Abstract: Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors. Most existing deep neural network compression and speed-up methods are very effective for training compact deep models, when we can directly access the training dataset. However, training data for the given deep network are often unavailable due to some practice problems (e.g. privacy, legal issue, and transmissio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
245
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 266 publications
(270 citation statements)
references
References 25 publications
0
245
0
Order By: Relevance
“…Nayak et al [26] propose to generate training data for student using data impressions by modeling the softmax space using Dirichlet distribution. Recently proposed methods directly generate synthesized training data for the student network using adversarial learning [27] [28][29] [30] or deepdream data propagation [31] [32]. Although these algorithms can train the student networks without any available training data of the teacher, none has reported performance comparable to supervised methods.…”
Section: Data-free Knowledge Distillationmentioning
confidence: 99%
“…Nayak et al [26] propose to generate training data for student using data impressions by modeling the softmax space using Dirichlet distribution. Recently proposed methods directly generate synthesized training data for the student network using adversarial learning [27] [28][29] [30] or deepdream data propagation [31] [32]. Although these algorithms can train the student networks without any available training data of the teacher, none has reported performance comparable to supervised methods.…”
Section: Data-free Knowledge Distillationmentioning
confidence: 99%
“…In particular, this task could be regarded as a generative adversarial task if we take D mod , G sp as discriminator and generator in generative adversarial network respectively. In [22], a generator is trained to map the gaussian noise to an image of handwritten digits and only a well-trained digits classification module called teacher network is used. There are some similarities between that work with ours.…”
Section: ) Signal Processing Modulementioning
confidence: 99%
“…There are some similarities between that work with ours. Based on our specific task and the experience in [22], we formulate our loss function as…”
Section: ) Signal Processing Modulementioning
confidence: 99%
“…Among others, Hanting et al proposed a knowledge distillation method that does not require the participation of the original training data during model compression. By combining the ideas of the generative adversarial network (GAN) [47] and knowledge distillation, the student network is able to effectively learn the performance of the teacher network [48]. In [49], the authors proposed a knowledge distillation method based on the correlation between instances.…”
Section: Knowledge Distillationmentioning
confidence: 99%