ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053986
|View full text |Cite
|
Sign up to set email alerts
|

Deep Geometric Knowledge Distillation with Graphs

Abstract: In most cases deep learning architectures are trained disregarding the amount of operations and energy consumption. However, some applications, like embedded systems, can be resource-constrained during inference. A popular approach to reduce the size of a deep learning architecture consists in distilling knowledge from a bigger network (teacher) to a smaller one (student). Directly training the student to mimic the teacher representation can be effective, but it requires that both share the same latent space d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
34
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 30 publications
(36 citation statements)
references
References 23 publications
0
34
0
Order By: Relevance
“…As previously mentioned, in this work, we are interested in using graphs to ensure that latent spaces of DL architectures have some desirable properties. The various approaches we introduce in this paper are based on our previous contributions [8,10,14]. However, in this paper, they are presented for the first time using a unified methodology and formalism.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…As previously mentioned, in this work, we are interested in using graphs to ensure that latent spaces of DL architectures have some desirable properties. The various approaches we introduce in this paper are based on our previous contributions [8,10,14]. However, in this paper, they are presented for the first time using a unified methodology and formalism.…”
Section: Related Workmentioning
confidence: 99%
“…As a matter of fact, using these hand-crafted features as intermediate representations can cause sub-optimal solutions [5]. On the other hand, completely removing all constraints on the intermediate representations can cause the learning procedure to exhibit unwanted behavior, such as susceptibility to deviations of the inputs [6][7][8], or redundant features [9,10].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Distilling a model into itself, or self-distillation has also proven to be effective when iterated [23]. While individual knowledge distillation focused on the student mimicking the outputs of the teacher, relational knowledge distillation [24,25] made it reproduce the same relations and distances between training examples, yielding a better representation of the latent space for the student, and better generalization capabilities.…”
Section: Distillationmentioning
confidence: 99%
“…Park et al (2019) designed a relational potential function that facilitated transferring the mutual relations of teacher's output to the student. With a similar notion, Lassance et al (2020) built graphs for both the student and teacher. Latent representation geometry was then transferred by measuring the discrepancy between corresponding adjacency matrices.…”
Section: Knowledge Distillationmentioning
confidence: 99%