Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets

Pérez-Ortiz, Juan Antonio; Gers, Felix A.; Eck, Douglas; Schmidhuber, Jürgen

doi:10.1016/s0893-6080(02)00219-8

Cited by 80 publications

(72 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Its algorithms for shaping not only the linear but also the nonlinear parts allow LSTM to learn to solve tasks unlearnable by standard feedforward nets, support vector machines, hidden markov models, and previous RNNs. Previous work on LSTM has focused on gradientbased G-LSTM (Gers & Schmidhuber, 2001;Gers et al, 2000Gers et al, ,2002Graves & Schmidhuber, 2005;Hochreiter & Schmidhuber, 1997a;Pérez-Ortiz et al, 2003;Schmidhuber et al, 2002). Here we introduced the novel Evolino class of supervised learning algorithms for such nets that overcomes certain problems of gradient-based RNNs with local minima.…”

Section: Resultsmentioning

confidence: 99%

“…Using gradient-based learning for both linear and nonlinear nodes, LSTM networks can efficiently solve many tasks that were previously unlearnable using RNNs, (e.g., Gers & Schmidhuber, 2001;Gers, Schmidhuber, & Cummins, 2000;Gers, Schraudolph, & Schmidhuber, 2002;Graves & Schmidhuber, 2005;Hochreiter & Schmidhuber, 1997a;Pérez-Ortiz, Gers, Eck, & Schmidhuber, 2003;Schmidhuber, Gers, & Eck, 2002).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Training Recurrent Networks by Evolino

Schmidhuber

Wierstra²,

Gagliolo³

et al. 2007

Neural Computation

Self Cite

199

107

View full text Add to dashboard Cite

In recent years, gradient-based LSTM recurrent neural networks (RNNs) solved many previously RNN-unlearnable tasks. Sometimes, however, gradient information is of little use for training RNNs, due to numerous local minima. For such cases, we present a novel method: EVOlution of systems with LINear Outputs (Evolino). Evolino evolves weights to the nonlinear, hidden nodes of RNNs while computing optimal linear mappings from hidden state to output, using methods such as pseudoinverse-based linear regression. If we instead use quadratic programming to maximize the margin, we obtain the first evolutionary recurrent support vector machines. We show that Evolino-based LSTM can solve tasks that Echo State nets (Jaeger, 2004a) cannot and achieves higher accuracy in certain continuous function generation tasks than conventional gradient descent RNNs, including gradient-based LSTM.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Training Recurrent Networks by Evolino

Schmidhuber

Wierstra²,

Gagliolo³

et al. 2007

Neural Computation

Self Cite

199

107

View full text Add to dashboard Cite

show abstract

“…LSTM generalized well though, requiring only the 30 shortest exemplars (n ≤ 10) of the CSL a n b n c n to correctly predict the possible continuations of sequence prefixes for n up to 1000 and more. A combination of a decoupled extended Kalman filter (Kalman, 1960;Williams, 1992b;Puskorius and Feldkamp, 1994;Feldkamp et al, 1998;Haykin, 2001;Feldkamp et al, 2003) and an LSTM RNN (Pérez-Ortiz et al, 2003) learned to deal correctly with values of n up to 10 million and more. That is, after training the network was able to read sequences of 30,000,000 symbols and more, one symbol at a time, and finally detect the subtle differences between legal strings such as a 10,000,000 b 10,000,000 c 10,000,000 and very similar but illegal strings such as a 10,000,000 b 9,999,999 c 10,000,000 .…”

Section: : Supervised Recurrent Very Deep Learner (Lstm Rnn)mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

Self Cite

View full text Add to dashboard Cite

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.LATEX source: http://www.idsia.ch/˜juergen/DeepLearning8Oct2014.tex Complete BIBTEX file (888 kB): http://www.idsia.ch/˜juergen/deep.bib Preface This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond, sometimes using "local search" to follow citations of citations backwards in time. Since not all DL publications properly acknowledge earlier relevant work, additional global search strategies were employed, aided by consulting numerous neural network experts. As a result, the present preprint mostly consists of references. Nevertheless, through an expert selection bias I may have missed important work. A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century. For these reasons, this work should be viewed as merely a snapshot of an ongoing credit assignment process. To help improve it, please do not hesitate to send corrections and suggestions to juergen@idsia.ch.

show abstract

“…Single task (1-dim) learning compared to multi-task learning (5-dim). in future work, using other training algorithms, such as the Extended Kalman Filter (EKF) training [Pérez-Ortiz et al 2003] might seem promising in this respect.…”

Section: Dimension Feature Set Topologymentioning

confidence: 99%

A multitask approach to continuous five-dimensional affect sensing in natural speech

Eyben¹,

Wöllmer²,

Schuller³

2012

ACM Trans. Interact. Intell. Syst.

View full text Add to dashboard Cite

Automatic affect recognition is important for the ability of future technical system to interact with us socially in an intelligent way by understanding our current affective state. In recent years there has been a shift in the field of affect recognition from "in the lab" experiments with acted data to "in the wild" experiments with spontaneous and naturalistic data. Two major issues thereby are the proper segmentation of the input and adequate description and modelling of affective states. The first issue is crucial for responsive, real-time systems such as virtual agents and robots, where the latency of the analysis must be as small as possible. To address this issue we introduce a novel method of incremental segmentation to be used in combination with supra-segmental modelling For modelling of continuous affective states we use Long Short-Term Memory Recurrent Neural Networks, with which we can show an improvement in performance over standard recurrent neural networks and feed forward neural networks as well as Support Vector Regression. For experiments we use the SEMAINE database, which contains recordings of spontaneous and natural human to Wizard-of-Oz conversations. The recordings are annotated continuously in time and magnitude with FeelTrace for five affective dimensions, namely activation, expectation, intensity, power/dominance, and valence. To exploit dependencies between the five affective dimensions we investigate multi-task learning of all five dimensions augmented with inter-rater standard deviation. We can show improvements for multi-task over single task modelling. Correlation coefficients of up to 0.81 are obtained for the activation dimension and up to 0.58 for the valence dimension. The performance for the remaining dimensions were found to be in between that for activation and valence.

show abstract

Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets

Cited by 80 publications

References 15 publications

Training Recurrent Networks by Evolino

Training Recurrent Networks by Evolino

Deep learning in neural networks: An overview

A multitask approach to continuous five-dimensional affect sensing in natural speech

Contact Info

Product

Resources

About