2019
DOI: 10.48550/arxiv.1906.10546
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 5 publications
0
7
0
Order By: Relevance
“…For example, (Romero et al, 2015;Wang et al, 2018;Shen et al, 2018;Ye et al, 2020b) propose using intermediate feature representations as distillation targets instead of just network outputs, and (Tarvainen & Valpola, 2017;Yang et al, 2018;Zhang et al, 2019a) unify student and teacher network training to reduce computational costs. Knowledge distillation has also been extended to distilling multiple teachers, which is termed Knowledge Amalgamation (Shen et al, 2019a;Luo et al, 2019;Ye et al, 2019;.…”
Section: Related Workmentioning
confidence: 99%
“…For example, (Romero et al, 2015;Wang et al, 2018;Shen et al, 2018;Ye et al, 2020b) propose using intermediate feature representations as distillation targets instead of just network outputs, and (Tarvainen & Valpola, 2017;Yang et al, 2018;Zhang et al, 2019a) unify student and teacher network training to reduce computational costs. Knowledge distillation has also been extended to distilling multiple teachers, which is termed Knowledge Amalgamation (Shen et al, 2019a;Luo et al, 2019;Ye et al, 2019;.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, to amalgamate intermediate teacher features, [19] develops an encoder-decoder structure. Luo et al [20] adopt common feature learning to project features of all the teachers and the student close to each other. These CNN-based KA approaches share a common strategy that the student requires a fixed-sized hint (generated mostly by projection), which suffers from extra learning burden and loss of information.…”
Section: B Model Reusingmentioning
confidence: 99%
“…Furthermore, there has been an increasing interest in knowledge amalgamation (KA), an extension of KD, where knowledge of several teachers is transferred to one multi-talent student [18]- [23]. For example, [18]- [20] focus on training a student with complementary knowledge from homogeneous tasks, e.g., a couple of classification problems. However, these methods share a common strategy: the intermediate student features are required to mimic the aggregated hints (usually achieved by linear projection).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The knowledge is distilled via a layer-wise neuron sharing mechanism. CFL [25] distills the knowledge by learning a common feature space, wherein the student model mimics the transformed features of the teachers to aggregate knowledge. Although many such methods are proposed, the models involved are usually limited within grid domain.…”
Section: Related Workmentioning
confidence: 99%