2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00501
|View full text |Cite
|
Sign up to set email alerts
|

Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 47 publications
(14 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…However, this could result in a large model size and inefficient inference. In the ML community, researchers have already studied different techniques (e.g., weight quantization [58], [59], [101] and knowledge distillation [37], [67], [98], [185]) to build a lightweight model from a heavyweight one. For instance, Han et al [59] pruned the network, quantized parameters and compressed them using Huffman coding.…”
Section: Research Opportunitiesmentioning
confidence: 99%
“…However, this could result in a large model size and inefficient inference. In the ML community, researchers have already studied different techniques (e.g., weight quantization [58], [59], [101] and knowledge distillation [37], [67], [98], [185]) to build a lightweight model from a heavyweight one. For instance, Han et al [59] pruned the network, quantized parameters and compressed them using Huffman coding.…”
Section: Research Opportunitiesmentioning
confidence: 99%
“…Researchers have to trade off the performance and the cost of deployment, especially in special scenes where the computing resources are strictly limited. Much work has explored the lightweight of DNNs, such as network pruning [31], [32], knowledge distillation [33], [34], and quantization [35], [36]. DNNs run with low precision operations during inference provide power and memory advantages over full precision, and it also benefits low-bit-width artificial intelligence chip design [37], [38].…”
Section: Total Direct Effectmentioning
confidence: 99%
“…They demonstrated that the "dark knowledge" lies in the output distributions from a large capacity teacher network and benefits the student's representation learning. Recent works mainly explored to better transfer the "dark knowledge" and improve the efficiency from various aspects, such as reducing the difference between the teacher and student [3,5,18,34], designing student-friendly architecture [16,20], improving the distillation efficiency [7,14,27,29] and explaining the distillation's working mechanism [1,23].…”
Section: Knowledge Distillationmentioning
confidence: 99%