2018
DOI: 10.48550/arxiv.1807.06819
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-supervised Knowledge Distillation Using Singular Value Decomposition

Abstract: To solve deep neural network (DNN)'s huge training dataset and its high computation issue, so-called teacher-student (T-S) DNN which transfers the knowledge of T-DNN to S-DNN has been proposed. However, the existing T-S-DNN has limited range of use, and the knowledge of T-DNN is insufficiently transferred to S-DNN. To improve the quality of the transferred knowledge from T-DNN, we propose a new knowledge distillation using singular value decomposition (SVD). In addition, we define a knowledge transfer as a sel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 24 publications
0
1
0
Order By: Relevance
“…The feature-based knowledge type aims to calculate the distillation loss from the intermediate representations of the teacher and student models [ 39 , 40 , 41 , 42 , 43 , 44 ]. The relation-based knowledge type aims to utilize the relation from the feature maps [ 45 , 46 , 47 , 48 ]. Although it has a similarity with the previous feature-based KT in the perspective of using the intermediate feature map, it is distinguished from using the manipulated function of the feature maps such as the Gram matrix [ 45 ].…”
Section: System Modelmentioning
confidence: 99%
“…The feature-based knowledge type aims to calculate the distillation loss from the intermediate representations of the teacher and student models [ 39 , 40 , 41 , 42 , 43 , 44 ]. The relation-based knowledge type aims to utilize the relation from the feature maps [ 45 , 46 , 47 , 48 ]. Although it has a similarity with the previous feature-based KT in the perspective of using the intermediate feature map, it is distinguished from using the manipulated function of the feature maps such as the Gram matrix [ 45 ].…”
Section: System Modelmentioning
confidence: 99%