2020
DOI: 10.1609/aaai.v34i04.5718
|View full text |Cite
|
Sign up to set email alerts
|

Few Shot Network Compression via Cross Distillation

Abstract: Model compression has been widely adopted to obtain light-weighted deep neural networks. Most prevalent methods, however, require fine-tuning with sufficient training data to ensure accuracy, which could be challenged by privacy and security issues. As a compromise between privacy and performance, in this paper we investigate few shot network compression: given few samples per class, how can we effectively compress the network with negligible performance drop? The core challenge of few shot network compression… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
38
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 47 publications
(38 citation statements)
references
References 18 publications
(33 reference statements)
0
38
0
Order By: Relevance
“…A majority of meta-learning methods for include metric-based (Snell et al, 2017;Pan et al, 2019), model-based (Santoro et al, 2016;Bartunov et al, 2020) and model-agnostic approaches (Finn et al, 2017(Finn et al, , 2018Vuorio et al, 2019). Meta-learning can also be applied to KD in some computer vision tasks (Lopes et al, 2017;Jang et al, 2019;Bai et al, 2020;Li et al, 2020). For example, Lopes et al ( 2017) record per-layer metadata for the teacher model to reconstruct a training set, and then adopts a standard training procedure to obtain the student model.…”
Section: Transfer Learning and Meta-learningmentioning
confidence: 99%
“…A majority of meta-learning methods for include metric-based (Snell et al, 2017;Pan et al, 2019), model-based (Santoro et al, 2016;Bartunov et al, 2020) and model-agnostic approaches (Finn et al, 2017(Finn et al, , 2018Vuorio et al, 2019). Meta-learning can also be applied to KD in some computer vision tasks (Lopes et al, 2017;Jang et al, 2019;Bai et al, 2020;Li et al, 2020). For example, Lopes et al ( 2017) record per-layer metadata for the teacher model to reconstruct a training set, and then adopts a standard training procedure to obtain the student model.…”
Section: Transfer Learning and Meta-learningmentioning
confidence: 99%
“…However, there is a semantic gap between the external knowledge and the samples. Therefore, we propose a knowledge distillation framework [17,18] for transferring the cross-modal knowledge. Recently, many crossmodal knowledge distillation frameworks have been proposed.…”
Section: Related Workmentioning
confidence: 99%
“…Li et al [11] presented few-sample knowledge distillation (FSKD), which is used in network compression where the student model is made by pruning the teacher model. Subsequently, Bai et al [12] proposed a novel layer-wise knowledge distillation approach for effectively compressing network with few data. Recently, Shen et al [13] proposed a novel grafting strategy for few-shot knowledge distillation.…”
Section: Introductionmentioning
confidence: 99%