Recurrent Neural Networks as Weighted Language Recognizers

Gilroy, Sorcha; Maletti, Andreas; May, Jonathan; Knight, Kevin

doi:10.18653/v1/n18-1205

Cited by 42 publications

(60 citation statements)

References 17 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A famous result by Siegelmann and Sontag (1992;1994), and its extension in (Siegelmann, 1999), demonstrates that an Elman-RNN (Elman, 1990) with a sigmoid activation function, rational weights and infinite precision states can simulate a Turing-machine in real-time, making RNNs Turing-complete. Recently, Chen et al (2017) extended the result to the ReLU activation function. However, these constructions (a) assume reading the entire input into the RNN state and only then performing the computation, using unbounded time; and (b) rely on having infinite precision in the network states.…”

Section: Introductionmentioning

confidence: 93%

“…However, these constructions (a) assume reading the entire input into the RNN state and only then performing the computation, using unbounded time; and (b) rely on having infinite precision in the network states. As argued by Chen et al (2017), this is not the model of RNN computation used in NLP applications. Instead, RNNs are often used by feeding an input sequence into the RNN one item at a time, each immediately returning a statevector that corresponds to a prefix of the sequence and which can be passed as input for a subsequent feed-forward prediction network operating in constant time.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

Weiß¹,

Goldberg²,

Yahav³

2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

165

164

View full text Add to dashboard Cite

While Recurrent Neural Networks (RNNs) are famously known to be Turing complete, this relies on infinite precision in the states and unbounded computation time. We consider the case of RNNs with finite precision whose computation time is linear in the input length. Under these limitations, we show that different RNN variants have different computational power. In particular, we show that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. This is achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We show empirically that the LSTM does indeed learn to effectively use the counting mechanism.

show abstract

Section: Introductionmentioning

confidence: 93%

Section: Introductionmentioning

confidence: 99%

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

Weiß¹,

Goldberg²,

Yahav³

2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

165

164

View full text Add to dashboard Cite

show abstract

“…Ott et al (2018) argued that uncertainty caused by noisy training data may play a role. Chen et al (2018) showed that the consistent best string problem for RNNs is decidable. We provide an alternative DFS algorithm that relies on the monotonic nature of model scores rather than consistency, and that often converges in practice.…”

Section: Related Workmentioning

confidence: 99%

On NMT Search Errors and Model Errors: Cat Got Your Tongue?

Stahlberg¹,

Byrne²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

128

View full text Add to dashboard Cite

We report on search errors and model errors in neural machine translation (NMT). We present an exact inference procedure for neural sequence models based on a combination of beam search and depth-first search. We use our exact search to find the global best model scores under a Transformer base model for the entire WMT15 English-German test set. Surprisingly, beam search fails to find these global best model scores in most cases, even with a very large beam size of 100. For more than 50% of the sentences, the model in fact assigns its global best score to the empty translation, revealing a massive failure of neural models in properly accounting for adequacy. We show by constraining search with a minimum translation length that at the root of the problem of empty translations lies an inherent bias towards shorter translations. We conclude that vanilla NMT in its current form requires just the right amount of beam search errors, which, from a modelling perspective, is a highly unsatisfactory conclusion indeed, as the model often prefers an empty translation.

show abstract

“…Other works have studied the expressive power of RNNs, in particular in the context of WFSAs or HMMs (Cleeremans et al, 1989;Giles et al, 1992;Visser et al, 2001;Chen et al, 2018). In this work we relate CNNs to WFSAs, showing that a one-layer CNN with max-pooling can be simulated by a collection of linear-chain WFSAs.…”

Section: Related Workmentioning

confidence: 95%

Bridging CNNs, RNNs, and Weighted Finite-State Machines

Schwartz

Thomson

Smith

2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural language utterances. In this paper we present SoPa, a new model that aims to bridge these two approaches. SoPa combines neural representation learning with weighted finite-state automata (WFSAs) to learn a soft version of traditional surface patterns. We show that SoPa is an extension of a one-layer CNN, and that such CNNs are equivalent to a restricted version of SoPa, and accordingly, to a restricted form of WFSA. Empirically, on three text classification tasks, SoPa is comparable or better than both a BiLSTM (RNN) baseline and a CNN baseline, and is particularly useful in small data settings.

show abstract

Recurrent Neural Networks as Weighted Language Recognizers

Cited by 42 publications

References 17 publications

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

On NMT Search Errors and Model Errors: Cat Got Your Tongue?

Bridging CNNs, RNNs, and Weighted Finite-State Machines

Contact Info

Product

Resources

About