2019
DOI: 10.1109/access.2019.2937139
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Weight Decay for Deep Neural Networks

Abstract: Regularization in the optimization of deep neural networks is often critical to avoid undesirable over-fitting leading to better generalization of model. One of the most popular regularization algorithms is to impose L2 penalty on the model parameters resulting in the decay of parameters, called weightdecay, and the decay rate is generally constant to all the model parameters in the course of optimization. In contrast to the previous approach based on the constant rate of weight-decay, we propose to consider t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(19 citation statements)
references
References 25 publications
0
19
0
Order By: Relevance
“…To further address this problem, the networks architecture could be expanded, including one or several dropout layers, which randomly drop connections between layers during training and lessen their linkage ( 28 ). Other methods might include “early stopping” which prevents further training of the network when the peak performance is reached or “weight decay” which continually decreases the weights of the network during the training phase ( 29 ).…”
Section: Discussionmentioning
confidence: 99%
“…To further address this problem, the networks architecture could be expanded, including one or several dropout layers, which randomly drop connections between layers during training and lessen their linkage ( 28 ). Other methods might include “early stopping” which prevents further training of the network when the peak performance is reached or “weight decay” which continually decreases the weights of the network during the training phase ( 29 ).…”
Section: Discussionmentioning
confidence: 99%
“…The SGD is important as it updates the parameters with mini-batch B = 10 examples. The momentum was set to 9 × 10 −1 and the weight decay was set to 1 × 10 −3 , as the network is considered a shallow network [31]. The weight decay marginal value is important as it helps to minimize the model training error [31].…”
Section: Network Training Phasementioning
confidence: 99%
“…The momentum was set to 9 × 10 −1 and the weight decay was set to 1 × 10 −3 , as the network is considered a shallow network [31]. The weight decay marginal value is important as it helps to minimize the model training error [31]. The training and results are performed using an Intel Core i5 machine, 2.9 GHZ occupied with 8GB of working RAM.…”
Section: Network Training Phasementioning
confidence: 99%
“…The slow convergence of PINN due to presence of noise in the data is circumvented by the approach of weight decay [18], which eventually bounds the weights of neural network, hence resulting in a faster convergence. Therefore, the loss function in equation ( 10) is further modified and expressed as…”
Section: A Pinn For Bi-crystal Nickelmentioning
confidence: 99%