Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges 2019
DOI: 10.18653/v1/w19-3905
|View full text |Cite
|
Sign up to set email alerts
|

LSTM Networks Can Perform Dynamic Counting

Abstract: In this paper, we systematically assess the ability of standard recurrent networks to perform dynamic counting and to encode hierarchical representations. All the neural models in our experiments are designed to be smallsized networks both to prevent them from memorizing the training sets and to visualize and interpret their behaviour at test time.Our results demonstrate that the Long Short-Term Memory (LSTM) networks can learn to recognize the well-balanced parenthesis language (Dyck-1) and the shuffles of mu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
48
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(64 citation statements)
references
References 31 publications
5
48
1
Order By: Relevance
“…There is empirical evidence that the hidden states of LSTM sequence-to-sequence models trained to perform machine translation models track sequence length by implementing something akin to a counter that increments during encoding and decrements during decoding (Shi et al, 2016). These results are consistent with theoretical and empirical findings that show that LSTMs can efficiently implement counting mechanisms (Weiss et al, 2018;Suzgun et al, 2019a;Merrill, 2020). Our experiments will show that tracking absolute token position by implementing something akin to these counters makes extrapolation difficult.…”
Section: Related Worksupporting
confidence: 64%
“…There is empirical evidence that the hidden states of LSTM sequence-to-sequence models trained to perform machine translation models track sequence length by implementing something akin to a counter that increments during encoding and decrements during decoding (Shi et al, 2016). These results are consistent with theoretical and empirical findings that show that LSTMs can efficiently implement counting mechanisms (Weiss et al, 2018;Suzgun et al, 2019a;Merrill, 2020). Our experiments will show that tracking absolute token position by implementing something akin to these counters makes extrapolation difficult.…”
Section: Related Worksupporting
confidence: 64%
“…Our results on the parentheses corpora do not necessarily provide proof that the LSTMs trained on the Nesting Parentheses corpus aren't encoding and utilizing hierarchical structure. In fact, previous research shows that LSTMs are able to suc-cessfully model stack-based hierarchical languages (Suzgun et al, 2019b;Yu et al, 2019;Suzgun et al, 2019a). What our results do indicate is that, in order for LSTMs to model human language, being able to model hierarchical structure is similar in utility to having access to a non-hierarchical ability to "look back" at one relevant dependency.…”
Section: Discussionsupporting
confidence: 53%
“…On the other hand, a long line of research has sought to understand the capabilities of recurrent neural models such as the LSTMs (Hochreiter and Schmidhuber, 1997) . Recently, Weiss et al (2018), Suzgun et al (2019a) showed that LSTMs are capable of recognizing counter languages such as Dyck-1 and a n b n by learning to perform counting like behavior. Suzgun et al (2019a) showed that LSTMs can recognize shuffles of multiple Dyck-1 languages, also known as Shuffle-Dyck.…”
Section: Introductionmentioning
confidence: 99%