2020
DOI: 10.1007/978-3-030-39575-9_6
|View full text |Cite
|
Sign up to set email alerts
|

Guided Layer-Wise Learning for Deep Models Using Side Information

Abstract: Training of deep models for classification tasks is hindered by local minima problems and vanishing gradients, while unsupervised layerwise pretraining does not exploit information from class labels. Here, we propose a new regularization technique, called diversifying regularization (DR), which applies a penalty on hidden units at any layer if they obtain similar features for different types of data. For generative models, DR is defined as divergence over the variational posteriori distributions and included i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…One solution is to increase the number of training data so that the gradient vector spans the total epochs of the training. Many methods have been proposed, such as alternate weight initialization schemes [182], unsupervised pre-training [183], guided layer-wise training [184] and variations on gradient descent. Authors used ReLU, which prevents the gradient to diminish.…”
Section: Challenges In the Deep Learning Architecturesmentioning
confidence: 99%
“…One solution is to increase the number of training data so that the gradient vector spans the total epochs of the training. Many methods have been proposed, such as alternate weight initialization schemes [182], unsupervised pre-training [183], guided layer-wise training [184] and variations on gradient descent. Authors used ReLU, which prevents the gradient to diminish.…”
Section: Challenges In the Deep Learning Architecturesmentioning
confidence: 99%