2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00360
|View full text |Cite
|
Sign up to set email alerts
|

Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation

Abstract: A massive number of well-trained deep networks have been released by developers online. These networks may focus on different tasks and in many cases are optimized for different datasets. In this paper, we study how to exploit such heterogeneous pre-trained networks, known as teachers, so as to train a customized student network that tackles a set of selective tasks defined by the user. We assume no human annotations are available, and each teacher may be either single-or multi-task. To this end, we introduce … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
3

Relationship

2
7

Authors

Journals

citations
Cited by 38 publications
(25 citation statements)
references
References 29 publications
0
21
0
Order By: Relevance
“…In addition to classification tasks [14,39,10], knowledge distillation can also be applied to other tasks such as semantic segmentation [25,17] and depth estimation [29]. Recently, it has also been extended to multitasking [38,34]. By learning from multiple models, the student model can combine knowledge from different tasks to achieve better performance.…”
Section: Data-driven Knowledge Distillationmentioning
confidence: 99%
“…In addition to classification tasks [14,39,10], knowledge distillation can also be applied to other tasks such as semantic segmentation [25,17] and depth estimation [29]. Recently, it has also been extended to multitasking [38,34]. By learning from multiple models, the student model can combine knowledge from different tasks to achieve better performance.…”
Section: Data-driven Knowledge Distillationmentioning
confidence: 99%
“…Similar limits can be found in recent classifier amalgamation works 1 . A few recent works [21] [22] [23] has been proposed to unifying heterogeneous teacher classifiers. Without a predefined dustbin class, [23] requires overlapped classes of objects recognized by teacher models, otherwise the model failed to find an optimal feature 1 A detailed comparison at https://github.com/zju-vipa/KamalEngine alignments.…”
Section: B Multi-teacher Knowledge Distillationmentioning
confidence: 99%
“…Without a predefined dustbin class, [23] requires overlapped classes of objects recognized by teacher models, otherwise the model failed to find an optimal feature 1 A detailed comparison at https://github.com/zju-vipa/KamalEngine alignments. Both [22] and [23] learns to extract common feature representation using additional knowledge amalgamation networks. This caused extra use of memory as the number of teachers increasing.…”
Section: B Multi-teacher Knowledge Distillationmentioning
confidence: 99%
“…Network Binarization. In the field of model compression [64,44,45,5,46], network binarization techniques aim to save memory occupancy and accelerate the network inference by binarizing network parameters and then utilizing bitwise operations [14,15,4]. In recent years, various CNN binarization methods have been proposed, which can be categorized into direct binarization [6,14,15,20] and optimization-based binarization [40,4,30].…”
Section: Related Workmentioning
confidence: 99%