On the Practical Computational Power of Finite Precision RNNs for Language Recognition

Weiß, Gabriele; Goldberg, Yoav; Yahav, Eran

doi:10.18653/v1/p18-2117

Cited by 165 publications

(187 citation statements)

References 15 publications

Supporting

Mentioning

182

Contrasting

Order By: Relevance

“…14 LSTM are more powerful than GRU networks, as they are able to learn a counting mechanism. 89 Combined with a simple hill-climb algorithm for optimization, which is an off-policy policy gradient algorithm with binary rewards and can also be interpreted as iterative fine-tuning, LSTM has recently been shown to perform as well as more sophisticated reinforcement learning algorithms such as proximal policy optimization (PPO) or advantage actor critic (A2C). 49 The model used was an LSTM with 3 layers of hidden size of 1024.…”

Section: Smiles Lstmmentioning

confidence: 99%

GuacaMol: Benchmarking Models for de Novo Molecular Design

Brown

Fiscato

Segler

et al. 2019

J. Chem. Inf. Model.

633

883

View full text Add to dashboard Cite

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking open-source Python code, and a leaderboard can be found on https://benevolent.ai/guacamol.

show abstract

Section: Smiles Lstmmentioning

confidence: 99%

GuacaMol: Benchmarking Models for de Novo Molecular Design

Brown

Fiscato

Segler

et al. 2019

J. Chem. Inf. Model.

633

883

View full text Add to dashboard Cite

show abstract

“…Also related is the work of Weiss et al (2018), who demonstrate that LSTMs are able to count infinitely, since their cell states are unbounded, while GRUs cannot count infinitely since the activations are constrained to a finite range. One avenue of future work could compare the performance of LSTMs and GRUs on the memorization task.…”

Section: Related Workmentioning

confidence: 99%

LSTMs Exploit Linguistic Attributes of Data

Liu¹,

Levy²,

Schwartz³

et al. 2018

Proceedings of the Third Workshop on Representation Learning for NLP

View full text Add to dashboard Cite

While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data. We investigate how the properties of natural language data affect an LSTM's ability to learn a nonlinguistic task: recalling elements from its input. We find that models trained on natural language data are able to recall tokens from much longer sequences than models trained on non-language sequential data. Furthermore, we show that the LSTM learns to solve the memorization task by explicitly using a subset of its neurons to count timesteps in the input. We hypothesize that the patterns and structure in natural language data enable LSTMs to learn by providing approximate ways of reducing loss, but understanding the effect of different training data on the learnability of LSTMs remains an open question.

show abstract

“…However, there still remain some fundamental questions regarding the practical computational expressivity of RNNs with finite precision. Weiss et al (2018) have recently demonstrated that Long Short-Term Memory (LSTM) models (Hochreiter and Schmidhuber, 1997), a popular variant of RNNs, can, theoretically, emulate a simple real-time k-counter machine, which can be described as a finite state controller with k separate counters, each containing integer values and capable of manipulating their content by adding ±1 or 0 at each time step (Fischer et al, 1968). The authors further tested their theoretical result by training the LSTM networks to learn a n b n and a n b n c n .…”

Section: Introductionmentioning

confidence: 99%

LSTM Networks Can Perform Dynamic Counting

Süzgün¹,

Gehrmann²,

Belinkov³

et al. 2019

Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges

View full text Add to dashboard Cite

In this paper, we systematically assess the ability of standard recurrent networks to perform dynamic counting and to encode hierarchical representations. All the neural models in our experiments are designed to be smallsized networks both to prevent them from memorizing the training sets and to visualize and interpret their behaviour at test time.Our results demonstrate that the Long Short-Term Memory (LSTM) networks can learn to recognize the well-balanced parenthesis language (Dyck-1) and the shuffles of multiple Dyck-1 languages, each defined over different parenthesis-pairs, by emulating simple realtime k-counter machines. To the best of our knowledge, this work is the first study to introduce the shuffle languages to analyze the computational power of neural networks. We also show that a single-layer LSTM with only one hidden unit is practically sufficient for recognizing the Dyck-1 language. However, none of our recurrent networks was able to yield a good performance on the Dyck-2 language learning task, which requires a model to have a stack-like mechanism for recognition.

show abstract

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

Cited by 165 publications

References 15 publications

GuacaMol: Benchmarking Models for de Novo Molecular Design

GuacaMol: Benchmarking Models for de Novo Molecular Design

LSTMs Exploit Linguistic Attributes of Data

LSTM Networks Can Perform Dynamic Counting

Contact Info

Product

Resources

About