2000
DOI: 10.1162/089976600300015015
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Forget: Continual Prediction with LSTM

Abstract: Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel, adaptive "forget ga… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
2,538
0
33

Year Published

2009
2009
2021
2021

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 4,503 publications
(2,575 citation statements)
references
References 13 publications
4
2,538
0
33
Order By: Relevance
“…LSTM had more successful runs, and learns much faster, than real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking. However, Gers et al [7] identified a weakness in LSTM networks processing continual input streams that were not a priori segmented into subsequences with explicitly marked ends where the internal state of the network could be reset. Without resets, the state could grow indefinitely and eventually cause the network to break down.…”
Section: Recurrent Neural Network (Rnns)mentioning
confidence: 99%
“…LSTM had more successful runs, and learns much faster, than real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking. However, Gers et al [7] identified a weakness in LSTM networks processing continual input streams that were not a priori segmented into subsequences with explicitly marked ends where the internal state of the network could be reset. Without resets, the state could grow indefinitely and eventually cause the network to break down.…”
Section: Recurrent Neural Network (Rnns)mentioning
confidence: 99%
“…The input, output and forget gates are connected via "peepholes". For a full specification of the LSTM model we refer to (Hochreiter & Schmidhuber, 1997) and (Gers et al, 2000).…”
Section: Long Short-term Memorymentioning
confidence: 99%
“…2): one representing a cortical area (frontal or parietal) that learns the environment via unsupervised learning mechanisms, and one representing the basal ganglia and the dopaminergic system responsible for upcoming reward estimates and reward estimation errors. The cortical area was modeled using LSTM networks (Hochreiter and Schmidhuber 1997;Gers et al 2000;Gers et al 2002) to learn its environment. LSTM is a general neural network learning algorithm used in a wide range of machine learning applications (Eck and Schmidhuber 2002;Bakker 2002) that implements working memory in an intuitive way using gated recurrent loop mechanisms.…”
Section: The Modelmentioning
confidence: 99%
“…2 and the signal is fed back (as part of the gradient of the error function) into the LSTM for weight updates. The model uses the full form of the LSTM network that can be found in (Hochreiter and Schmidhuber 1997;Gers et al 2000;Gers et al 2002). An LSTM network consists of a set of inputs, memory neurons (called state cells in the LSTM literature), gates and a bank of outputs.…”
Section: Lstm Model Of the Cortexmentioning
confidence: 99%
See 1 more Smart Citation