Neural Stored-program Memory

Le, Hung; Tran, Truyen; Venkatesh, Svetha

doi:10.48550/arxiv.1906.08862

Cited by 3 publications

(5 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We note that this problem of finding techniques for efficient storage and retrieval of memory elements is not new in the machine learning research community and there's been multiple proposals for achieving efficient storage and retrieval for different domains (Santoro et al, 2016;Gulcehre et al, 2017;Le et al, 2019). One of the simplest of these proposals is the Neural Turing Machine (NTM) and the memory addressing mechanism it proposes.…”

Section: Matrix Representation In Neural Memorymentioning

confidence: 99%

Memory Capacity of Recurrent Neural Networks with Matrix Representation

Renanse¹,

Sharma²,

Chandra³

2021

Preprint

View full text Add to dashboard Cite

It is well known that recurrent neural networks (RNNs) faced limitations in learning longterm dependencies that have been addressed by memory structures in long short-term memory (LSTM) networks. Matrix neural networks feature matrix representation which inherently preserves the spatial structure of data and has the potential to provide better memory structures when compared to canonical neural networks that use vector representation. Neural Turing machines (NTMs) are novel RNNs that implement notion of programmable computers with neural network controllers to feature algorithms that have copying, sorting, and associative recall tasks. In this paper, we study augmentation of memory capacity with matrix representation of RNNs and NTMs (MatNTMs). We investigate if matrix representation has a better memory capacity than the vector representations in conventional neural networks. We use a probabilistic model of the memory capacity using Fisher information and investigate how the memory capacity for matrix representation networks are limited under various constraints, and in general, without any constraints. In the case of memory capacity without any constraints, we found that the upper bound on memory capacity to be N 2 for an N ×N state matrix. The results from our experiments using synthetic algorithmic tasks show that MatNTMs have a better learning capacity when compared to its counterparts.

show abstract

Section: Matrix Representation In Neural Memorymentioning

confidence: 99%

Memory Capacity of Recurrent Neural Networks with Matrix Representation

Renanse¹,

Sharma²,

Chandra³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Many neural network models take the form of a first order (in weights) recurrent neural network (RNN) and have been taught to learn context free and context-sensitive counter languages [17,9,5,64,70,56,48,66,8,36,8,67]. However, from a theoretical perspective, RNNs augmented with an external memory have historically been shown to be more capable of recognizing context free languages (CFLs), such as with a discrete stack [10,55,61], or, more recently, with various differentiable memory structures [33,26,24,39,73,28,72,25,40,41,3,42]. Despite positive results, prior work on CFLs was unable to achieve perfect generalization on data beyond the training dataset, highlighting a troubling difficulty in preserving long term memory.…”

Section: Related Workmentioning

confidence: 99%

A provably stable neural network Turing Machine

Stogin¹,

Mali²,

Giles³

2020

Preprint

View full text Add to dashboard Cite

Given a collection of strings belonging to a context free grammar (CFG) and another collection of strings not belonging to the CFG, how might one infer the grammar? This is the problem of grammatical inference. Since CFGs are the languages recognized by pushdown automata (PDA), it suffices to determine the state transition rules and stack action rules of the corresponding PDA. An approach would be to train a recurrent neural network (RNN) to classify the sample data and attempt to extract these PDA rules. But neural networks are not a priori aware of the structure of a PDA and would likely require many samples to infer this structure. Furthermore, extracting the PDA rules from the RNN is nontrivial. We build a RNN specifically structured like a PDA, where weights correspond directly to the PDA rules. This requires a stack architecture that is somehow differentiable (to enable gradient-based learning) and stable (an unstable stack will show deteriorating performance with longer strings). We propose a stack architecture that is differentiable and that provably exhibits orbital stability. Using this stack, we construct a neural network that provably approximates a PDA for strings of arbitrary length. Moreover, our model and method of proof can easily be generalized to other state machines, such as a Turing Machine. * Equal contribution Preprint. Under review.

show abstract

“…In a more recent parallel thread in machine learning, various memory networks have been devised to augment traditional neural networks [Graves et al, 2014, Sukhbaatar et al, 2015, Munkhdalai et al, 2019, Le et al, 2019, Bartunov et al, 2019. Memory-augmented neural networks utilize a more stable external memory system analogous to computer memory, in contrast to more volatile storage mechanisms such as recurrent neural networks [Rodriguez et al, 2019].…”

Section: Introductionmentioning

confidence: 99%

“…Key-value networks date back to at least the 1980s with Sparse Distributed Memory (SDM) as a model of human long-term memory [Kanerva, 1988[Kanerva, , 1992. Inspired by random-access memory in computers, it is at the core of many memory networks recently developed in machine learning [Graves et al, 2014, Sukhbaatar et al, 2015, Banino et al, 2020, Le et al, 2019. A basic key-value network contains a key matrix K and a value matrix V .…”

Section: Introductionmentioning

confidence: 99%

“…Some works rely on highly flexible mechanisms where an external controller learns which slots to write and overwrite [Graves et al, 2016[Graves et al, , 2014, although appropriate memory write strategies can be difficult to learn. Other works have used simpler mechanisms where new memories can be appended sequentially to the existing set of memories [Sukhbaatar et al, 2015, Hung et al, 2019, Ramsauer et al, 2020 or written through gradient descent [Bartunov et al, 2019, Munkhdalai et al, 2019, Le et al, 2019, Krotov and Hopfield, 2016. Kanerva [1992] updates the value matrix through Hebbian plasticity, but fixes the key matrix.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Biological learning in key-value memory networks

Tyulmankov¹,

Fang²,

Vadaparty³

et al. 2021

Preprint

View full text Add to dashboard Cite

In neuroscience, classical Hopfield networks are the standard biologically plausible model of long-term memory, relying on Hebbian plasticity for storage and attractor dynamics for recall. In contrast, memory-augmented neural networks in machine learning commonly use a key-value mechanism to store and read out memories in a single step. Such augmented networks achieve impressive feats of memory compared to traditional variants, yet their biological relevance is unclear. We propose an implementation of basic key-value memory that stores inputs using a combination of biologically plausible three-factor plasticity rules. The same rules are recovered when network parameters are meta-learned. Our network performs on par with classical Hopfield networks on autoassociative memory tasks and can be naturally extended to continual recall, heteroassociative memory, and sequence learning. Our results suggest a compelling alternative to the classical Hopfield network as a model of biological long-term memory.

show abstract

Neural Stored-program Memory

Cited by 3 publications

References 14 publications

Memory Capacity of Recurrent Neural Networks with Matrix Representation

Memory Capacity of Recurrent Neural Networks with Matrix Representation

A provably stable neural network Turing Machine

Biological learning in key-value memory networks

Contact Info

Product

Resources

About