LSTM recurrent networks learn simple context-free and context-sensitive languages

Gers, Felix A.; Schmidhuber, E.

doi:10.1109/72.963769

Cited by 577 publications

(294 citation statements)

References 19 publications

Supporting

Mentioning

271

Contrasting

Unclassified

Order By: Relevance

“…It is possible to evolve good problemspecific topologies (Bayer et al, 2009). Some LSTM variants also use modifiable self-connections of CECs (Gers and Schmidhuber, 2001).…”

Section: : Supervised Recurrent Very Deep Learner (Lstm Rnn)mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.LATEX source: http://www.idsia.ch/˜juergen/DeepLearning8Oct2014.tex Complete BIBTEX file (888 kB): http://www.idsia.ch/˜juergen/deep.bib Preface This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond, sometimes using "local search" to follow citations of citations backwards in time. Since not all DL publications properly acknowledge earlier relevant work, additional global search strategies were employed, aided by consulting numerous neural network experts. As a result, the present preprint mostly consists of references. Nevertheless, through an expert selection bias I may have missed important work. A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century. For these reasons, this work should be viewed as merely a snapshot of an ongoing credit assignment process. To help improve it, please do not hesitate to send corrections and suggestions to juergen@idsia.ch.

show abstract

“…It is possible to evolve good problemspecific topologies (Bayer et al, 2009). Some LSTM variants also use modifiable self-connections of CECs (Gers and Schmidhuber, 2001).…”

Section: : Supervised Recurrent Very Deep Learner (Lstm Rnn)mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

show abstract

“…Together with recent parallel or subsequent work (e.g. Bodén and Wiles, 2000;Gers and Schmidhuber, 2001;Rodriguez, 2001) on learning of MA and similar tasks our study might therefore contribute to a better understanding of the capabilities of recurrent neural networks to process natural language. While our first experimental results were announced in (Chalup and Blair, 1999) we now provide a detailed exposition of our investigation.…”

Section: ------------------------------------------------------------mentioning

confidence: 99%

“…With the aim of investigating learnability of the one-step look-ahead prediction task for non-regular languages we performed experiments using training sequences formed by strings of one of the following types, where always n ≥ 1: Since our interest is focused on the language of multiple agreements MA = {s n ; n ≥ 1} we will from now on use strings s n = a Since the depth n is not known to the network at the start of a string it cannot predict when the first b will occur and logically it cannot know how many a's 3 Some authors have applied a more rigid interpretation, insisting that the output for the predicted symbol must be above a fixed threshold and that the outputs for all other symbols must be below that threshold (e. g. Rodriguez et al, 1999;Bodén and Wiles, 2000;Gers and Schmidhuber, 2001). …”

Section: Prediction Taskmentioning

confidence: 99%

“…Tonkes and Wiles (1999) suggested that the limited generalisation ability could model human performance when processing center embedded sentences. In two recent studies Melnik et al (2000) evolved RAAM networks capable of expressing all strings of the a n b n language, while Gers and Schmidhuber (2001) trained Long Short-Term Memory Networks to predict the a n b n c n language with substantial generalisation ability. Both these network types are more complex than the first order networks of the present study and also the task presentation was different.…”

Section: ------------------------------------------------------------mentioning

confidence: 99%

“…This would alleviate the need for data juggling because the order of the strings would then be irrelevant. Gers and Schmidhuber (2001) employed an end-of-string marker but did not reset activations at the end of the strings. Wiles (2000, 2001) did not use the end-of-string marker and updated the weights at the end of each string without resetting at all.…”

Section: Data Jugglingmentioning

confidence: 99%

See 2 more Smart Citations

Incremental training of first order recurrent neural networks to predict a context-sensitive language

Chalup

Blair

2003

Neural Networks

View full text Add to dashboard Cite

In recent years it has been shown that first order recurrent neural networks trained by gradient-descent can learn not only regular but also simple context-free and context-sensitive languages. However, the success rate was generally low and severe instability issues were encountered. The present study examines the hypothesis that a combination of evolutionary hill climbing with incremental learning and a well-balanced training set enables first order recurrent networks to reliably learn context-free and mildly context-sensitive languages. In particular, we trained the networks to predict symbols in string sequences of the context-sensitive language

show abstract

Bibliography

2016

Natural Language Processing and Computational Linguistics 1

View full text Add to dashboard Cite

LSTM recurrent networks learn simple context-free and context-sensitive languages

Cited by 577 publications

References 19 publications

Deep learning in neural networks: An overview

Deep learning in neural networks: An overview

Incremental training of first order recurrent neural networks to predict a context-sensitive language

Bibliography

Contact Info

Product

Resources

About