The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2021
DOI: 10.1109/access.2021.3054915
|View full text |Cite
|
Sign up to set email alerts
|

Handling Vanishing Gradient Problem Using Artificial Derivative

Abstract: Sigmoid function and ReLU are commonly used activation functions in neural networks (NN). However, sigmoid function is vulnerable to the vanishing gradient problem, while ReLU has a special vanishing gradient problem that is called dying ReLU problem. Though many studies provided methods to alleviate this problem, there has not been an efficient feasible solution. Hence, we proposed a method replacing the original derivative function with an artificial derivative in a pertinent way. Our method optimized gradie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 52 publications
(18 citation statements)
references
References 7 publications
0
11
0
Order By: Relevance
“…The selected model implements DNN with hidden LSTM layers ( Figure 7 ). We used the rectified linear activation function (ReLu), since it overcomes the vanishing gradient problems present in RNNs [ 46 , 47 ]. It also allows models to learn faster and perform better.…”
Section: Methodsmentioning
confidence: 99%
“…The selected model implements DNN with hidden LSTM layers ( Figure 7 ). We used the rectified linear activation function (ReLu), since it overcomes the vanishing gradient problems present in RNNs [ 46 , 47 ]. It also allows models to learn faster and perform better.…”
Section: Methodsmentioning
confidence: 99%
“…The ReLU expedites the training and avoids the vanishing gradient [ 49 ]. The last layer in the network is called the output layer (classification layer), which gives the probability of occurrence of different classes.…”
Section: Methodsmentioning
confidence: 99%
“…In this network, the earliest layers of the design employ depth-wise separable convolutions to speed up the calculations involved in down sampling the input pictures. In order to increase convergence during training, they also devised a batch normalization layer that may reduce internal covariate shift and address the gradient vanishing problem [31]. ResNet50 is a slimmer iteration of ResNet101, which took first place in the ILSVRC classification challenge.…”
Section: Methodsmentioning
confidence: 99%