2020
DOI: 10.18034/ei.v8i2.570
|View full text |Cite
|
Sign up to set email alerts
|

The Difficulty of Learning Long-Term Dependencies with Gradient Flow in Recurrent Nets

Abstract: In theory, recurrent networks (RN) can leverage their feedback connections to store activations as representations of recent input events. The most extensively used methods for learning what to put in short-term memory, on the other hand, take far too long to be practicable or do not work at all, especially when the time lags between inputs and instructor signals are long. They do not provide significant practical advantages over, the backdrop in feedforward networks with limited time windows, despite being th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…Thanks to the LSTM memory unit, vanishing or exploding gradients, caused by the backpropagation through time in vanilla RNN training, which hampers learning of long term dependency in data sequences, can be avoided in the long-term learning process. 34,35…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Thanks to the LSTM memory unit, vanishing or exploding gradients, caused by the backpropagation through time in vanilla RNN training, which hampers learning of long term dependency in data sequences, can be avoided in the long-term learning process. 34,35…”
Section: Methodsmentioning
confidence: 99%
“…Thanks to the LSTM memory unit, vanishing or exploding gradients, caused by the backpropagation through time in vanilla RNN training, which hampers learning of long term dependency in data sequences, can be avoided in the long-term learning process. 34,35 The LSTM neural network has an internal memory that can learn long-term dependencies of sequential data. The LSTM unit (Fig.…”
Section: Lstm and Residual Connectionmentioning
confidence: 99%
“…RNNs have incredible success handling S2S tasks [43], where a decision at a time step t − 1 is affected by that of a time step t, signifying a temporal dependency. However, RNNs suffer from vanishing and exploding gradients and cannot capture long-term dependencies effectively [44], [45]. Gradient vanishing refers to the case where the gradient norm for long-term relationships decreases exponentially to zero, inhibiting the learning of long-term temporal relationships.…”
Section: F Cnn-gru Modelmentioning
confidence: 99%
“…On the other hand, the chain structure design of RNNs, which strictly follow the chronological development, makes RNN models unable to predict the future and thus cannot capture the potential causal relationships [16] between traffic events. In addition, the signal of RNN must propagate along the longest long path in the network [17], [18], which leads to the more extended the way in the network, the more likely it is to lose some vital information.…”
Section: Introductionmentioning
confidence: 99%