Short-Term Memory in Orthogonal Neural Networks

White, Olivia L.; Lee, Daniel D.; Sompolinsky, Haim

doi:10.1103/physrevlett.92.148102

Cited by 145 publications

(195 citation statements)

References 5 publications

Supporting

Mentioning

190

Contrasting

Order By: Relevance

“…During prediction, the true outputs (which are unknown) are replaced with the predicted outputs, which are fed to the recurrent units. ESNs work extremely well for predicting chaotic systems for a large number of timesteps, and their linear version has been rigorously analyzed (White et al, 2004). The major advantage of the ESN is that its recurrent weights are not learned, so learning is extremely fast.…”

Section: Related Workmentioning

confidence: 99%

Temporal-Kernel Recurrent Neural Networks

2010

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Temporal-Kernel Recurrent Neural Networks

2010

View full text Add to dashboard Cite

“…Extensions of the delay ring are the ensemble of orthogonal networks (studied in ref. 7). These are normal networks in which W is a rotation matrix.…”

Section: Examples Of Normal Networkmentioning

confidence: 99%

“…To what extent do these traces degrade in the presence of noise? Previous analytical work has addressed some of these questions under restricted assumptions about input statistics and network architectures (7). To answer these questions in a more general setting, we use Fisher information to construct a measure of memory traces in networks and other dynamical systems.…”

mentioning

confidence: 99%

Memory traces in dynamical systems

Ganguli

Huh

Sompolinsky

2008

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

260

362

View full text Add to dashboard Cite

To perform nontrivial, real-time computations on a sensory input stream, biological systems must retain a short-term memory trace of their recent inputs. It has been proposed that generic highdimensional dynamical systems could retain a memory trace for past inputs in their current state. This raises important questions about the fundamental limits of such memory traces and the properties required of dynamical systems to achieve these limits. We address these issues by applying Fisher information theory to dynamical systems driven by time-dependent signals corrupted by noise. We introduce the Fisher Memory Curve (FMC) as a measure of the signal-to-noise ratio (SNR) embedded in the dynamical state relative to the input SNR. The integrated FMC indicates the total memory capacity. We apply this theory to linear neuronal networks and show that the capacity of networks with normal connectivity matrices is exactly 1 and that of any network of N neurons is, at most, N. A nonnormal network achieving this bound is subject to stringent design constraints: It must have a hidden feedforward architecture that superlinearly amplifies its input for a time of order N, and the input connectivity must optimally match this architecture. The memory capacity of networks subject to saturating nonlinearities is further limited, and cannot exceed ͌ N. This limit can be realized by feedforward structures with divergent fan out that distributes the signal across neurons, thereby avoiding saturation. We illustrate the generality of the theory by showing that memory in fluid systems can be sustained by transient nonnormal amplification due to convective instability or the onset of turbulence.Fisher information ͉ fluid mechanics ͉ network dynamics C ritical cognitive phenomena such as planning and decisionmaking rely on the ability of the brain to hold information in short-term memory. It is thought that the neural substrate for such memory can arise from persistent patterns of neural activity, or attractors, that are stabilized through reverberating positive feedback, either at the single-cell (1) or network (2, 3) level. However, such simple attractor mechanisms are incapable of remembering sequences of past inputs.More recent proposals (4-6) have suggested that an arbitrary recurrent network could store information about recent input sequences in its transient dynamics, even if the network does not have information-bearing attractor states. Downstream readout networks can then be trained to instantaneously extract relevant functions of the past input stream to guide future actions. A useful analogy (4) is the surface of a liquid. Even though this surface has no attractors, save the trivial one in which it is flat, transient ripples on the surface can nevertheless encode information about past objects that were thrown in.This proposal raises a host of important theoretical questions. Are there any fundamental limits on the lifetimes of such transient memory traces? How do these limits depend on the size of the network? If fundamental limi...

show abstract

“…At the beginning of each cycle, a subset of granule cells received a depolarizing input to represent a postsynaptic potential that initiated the recurrent dynamics. The 100 granule cells were interconnected by weights defined by an orthogonal random weight matrix [14] with eigenvalues o1 [8]. These weights were not varied during adaptation and each synapse contributed to the membrane potential of the postsynaptic granule cell proportionally to the presynaptic spike probability.…”

Section: Resultsmentioning

confidence: 99%