2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01328
|View full text |Cite
|
Sign up to set email alerts
|

Tree-like Decision Distillation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 20 publications
0
10
0
Order By: Relevance
“…Over the last several years, most works devote themselves to the exploration of "what to distill", i.e., forms of knowledge for distillation. Representative forms include soft targets [1], features [6], [12], attention [7], factors [9], activation boundary [10], sample relationship [3], [13], [14] and so on. By imitating the teacher to behave in a similar way, the student model achieves comparable performance to the teacher model even with much fewer parameters.…”
Section: A Knowledge Distillationmentioning
confidence: 99%
“…Over the last several years, most works devote themselves to the exploration of "what to distill", i.e., forms of knowledge for distillation. Representative forms include soft targets [1], features [6], [12], attention [7], factors [9], activation boundary [10], sample relationship [3], [13], [14] and so on. By imitating the teacher to behave in a similar way, the student model achieves comparable performance to the teacher model even with much fewer parameters.…”
Section: A Knowledge Distillationmentioning
confidence: 99%
“…Our goal is to teach a lightweight RL-CC tree-based policy that imitates a NN-based policy over a representative distribution of inputs. Previous work has shown that model distillation with trees can work well on various tasks [6,12,27,31,43]. The biggest challenge in distilling the policy is the LSTM layer, which specializes in incorporating past information.…”
Section: Model Distillation With Boosting Treesmentioning
confidence: 99%
“…They point out that the soft targets which reveal how teachers tend to generalize, regularize the student. To sufficiently distill knowledge, FitNets [3] additionally use intermediate features (also known as hints) of the teacher and the following works [4]- [6] extract deeper level of information from the intermediate layers of teachers to supervise the student in different aspects.…”
Section: B Model Reusingmentioning
confidence: 99%
“…Following this teacher-student paradigm, FitNets [3] utilize intermediate features, also known as hints, to sufficiently supervise hidden layers of the student. Besides, a number of methods [4]- [6] have managed to extract deeper level of information from the intermediate layers and demonstrated promising results. Except for image classification, researchers are exploring KD towards many other tasks, such as semantic segmentation [7], [8]; object detection [9]- [12]; natural language processing [13]- [15]; and reinforcement learning [16], [17].…”
Section: Introductionmentioning
confidence: 99%