2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00756
|View full text |Cite
|
Sign up to set email alerts
|

Search to Distill: Pearls Are Everywhere but Not the Eyes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 54 publications
(33 citation statements)
references
References 20 publications
0
33
0
Order By: Relevance
“…In addition, a new reward function is suggested, which can effectively improve the quality of the generated networks and reduce the difficulty for manual hyperparameter tuning. Liu et al [56] present a novel knowledge distillation [57] approach to NAS, called architecture-aware knowledge distillation (AKD), which finds student models (compressed teacher models) that are best for distilling the given teacher model. The authors employ a RL-based NAS method with a KD-guided reward function to search for the best student model based on a given teacher model.…”
Section: Nas Based On Reinforcement Learningmentioning
confidence: 99%
“…In addition, a new reward function is suggested, which can effectively improve the quality of the generated networks and reduce the difficulty for manual hyperparameter tuning. Liu et al [56] present a novel knowledge distillation [57] approach to NAS, called architecture-aware knowledge distillation (AKD), which finds student models (compressed teacher models) that are best for distilling the given teacher model. The authors employ a RL-based NAS method with a KD-guided reward function to search for the best student model based on a given teacher model.…”
Section: Nas Based On Reinforcement Learningmentioning
confidence: 99%
“…We first explored the standard distillation approach in which we take the bestperforming model as the teacher. However, it is known that a wider gap in terms of architecture might mean a less effective transfer [19,33]. Thus, we also explore a sequential distillation approach.…”
Section: Born-again Distillationmentioning
confidence: 99%
“…In contrast, our task necessitates a deployment scenario with two models: one for processing the query images and another for processing the gallery. Recently, [18,15] propose to use a large teacher model to guide the architecture search process for a smaller student which is essentially knowledge distillation in architecture space. However, our experiments show that knowledge distillation cannot guarantee compatibility and thus these methods may not succeed in optimizing the architecture in that aspect.…”
Section: Related Workmentioning
confidence: 99%