2021
DOI: 10.1109/tnnls.2020.2970494
|View full text |Cite
|
Sign up to set email alerts
|

Learning Student Networks via Feature Embedding

Abstract: Deep convolutional neural networks have been widely used in numerous applications, but their demanding storage and computational resource requirements prevent their applications on mobile devices. Knowledge distillation aims to optimize a portable student network by taking the knowledge from a well-trained heavy teacher network. Traditional teacher-student based methods used to rely on additional fully-connected layers to bridge intermediate layers of teacher and student networks, which brings in a large numbe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
39
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 75 publications
(42 citation statements)
references
References 41 publications
0
39
0
1
Order By: Relevance
“…In adding each relation model, the convolutional layers consider the impacts between each shallow part, and add L2 loss from its extracted features. According to knowledge distillation, 84‐87 all parts with corresponding relation model can be regarded as student models, and the deepest can be regarded as the teacher model.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In adding each relation model, the convolutional layers consider the impacts between each shallow part, and add L2 loss from its extracted features. According to knowledge distillation, 84‐87 all parts with corresponding relation model can be regarded as student models, and the deepest can be regarded as the teacher model.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…According to knowledge distillation,[84][85][86][87] all parts with corresponding relation model can be regarded as student models, and the deepest can be regarded as the teacher model.Relation models (the proposed self-knowledge distillation has multiple relation models within a whole network) in the neural network are denoted as ∕…”
mentioning
confidence: 99%
“…The time-dependent correlation is established as a dynamic model based on the assumption that students' learning ability in the current course is only influenced by that in the last course, which is similar to Markov Chain [22]. Specifically, we define a function f (•) to capture the temporal variations of learning ability, which is shown in (4).…”
Section: B Modeling the Learning Abilitymentioning
confidence: 99%
“…Recent years have witnessed the success of the massive open online courses (MOOCs) and intelligent tutoring system (ITS), which accelerated the development of the educational data mining (EDM). The EDM seeks to develop some methods to detect hidden communities [1], identify the implicit relationships [2], explore the key influential factors of students' engagement [3], and analyze student learning behaviors and social activities [4], [5], etc. For instance, KDD CUP 2015 issues a challenge of predicting students' dropout rate with their personal behaviors.…”
Section: Introductionmentioning
confidence: 99%
“…Our new method combines the advantages of several model compression methods. Compared to the latest knowledge distillation methods [2,33], our method focuses on generating a student model from the original model by pruning. Therefore, we get a generated student network better suits the teacher network than manually selected network in simple knowledge distillation methods.…”
Section: Introductionmentioning
confidence: 99%