2003
DOI: 10.1016/s0893-6080(02)00219-8
|View full text |Cite
|
Sign up to set email alerts
|

Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
70
0
1

Year Published

2006
2006
2024
2024

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 80 publications
(72 citation statements)
references
References 15 publications
0
70
0
1
Order By: Relevance
“…Its algorithms for shaping not only the linear but also the nonlinear parts allow LSTM to learn to solve tasks unlearnable by standard feedforward nets, support vector machines, hidden markov models, and previous RNNs. Previous work on LSTM has focused on gradientbased G-LSTM (Gers & Schmidhuber, 2001;Gers et al, 2000Gers et al, ,2002Graves & Schmidhuber, 2005;Hochreiter & Schmidhuber, 1997a;Pérez-Ortiz et al, 2003;Schmidhuber et al, 2002). Here we introduced the novel Evolino class of supervised learning algorithms for such nets that overcomes certain problems of gradient-based RNNs with local minima.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Its algorithms for shaping not only the linear but also the nonlinear parts allow LSTM to learn to solve tasks unlearnable by standard feedforward nets, support vector machines, hidden markov models, and previous RNNs. Previous work on LSTM has focused on gradientbased G-LSTM (Gers & Schmidhuber, 2001;Gers et al, 2000Gers et al, ,2002Graves & Schmidhuber, 2005;Hochreiter & Schmidhuber, 1997a;Pérez-Ortiz et al, 2003;Schmidhuber et al, 2002). Here we introduced the novel Evolino class of supervised learning algorithms for such nets that overcomes certain problems of gradient-based RNNs with local minima.…”
Section: Resultsmentioning
confidence: 99%
“…Using gradient-based learning for both linear and nonlinear nodes, LSTM networks can efficiently solve many tasks that were previously unlearnable using RNNs, (e.g., Gers & Schmidhuber, 2001;Gers, Schmidhuber, & Cummins, 2000;Gers, Schraudolph, & Schmidhuber, 2002;Graves & Schmidhuber, 2005;Hochreiter & Schmidhuber, 1997a;Pérez-Ortiz, Gers, Eck, & Schmidhuber, 2003;Schmidhuber, Gers, & Eck, 2002).…”
Section: Introductionmentioning
confidence: 99%
“…LSTM generalized well though, requiring only the 30 shortest exemplars (n ≤ 10) of the CSL a n b n c n to correctly predict the possible continuations of sequence prefixes for n up to 1000 and more. A combination of a decoupled extended Kalman filter (Kalman, 1960;Williams, 1992b;Puskorius and Feldkamp, 1994;Feldkamp et al, 1998;Haykin, 2001;Feldkamp et al, 2003) and an LSTM RNN (Pérez-Ortiz et al, 2003) learned to deal correctly with values of n up to 10 million and more. That is, after training the network was able to read sequences of 30,000,000 symbols and more, one symbol at a time, and finally detect the subtle differences between legal strings such as a 10,000,000 b 10,000,000 c 10,000,000 and very similar but illegal strings such as a 10,000,000 b 9,999,999 c 10,000,000 .…”
Section: : Supervised Recurrent Very Deep Learner (Lstm Rnn)mentioning
confidence: 99%
“…Single task (1-dim) learning compared to multi-task learning (5-dim). in future work, using other training algorithms, such as the Extended Kalman Filter (EKF) training [Pérez-Ortiz et al 2003] might seem promising in this respect.…”
Section: Dimension Feature Set Topologymentioning
confidence: 99%