Nonlinear feedforward networks with stochastic outputs: infomax implies redundancy reduction

Nadal, Jean-Pierre; Brunel, Nicolas; Parga, Néstor

doi:10.1088/0954-898x_9_2_004

Cited by 26 publications

(10 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to measure the retrieval quality of the recall process, we use the mutual information function [5,6,13,14]. In general, it measures the average amount of information that can be received by the user by observing the signal at the output of a channel [15,16].…”

Section: The Modelmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive Thresholds for Layered Neural Networks with Synaptic Noise

Bollé

Heylen

2006

Artificial Neural Networks – ICANN 2006

View full text Add to dashboard Cite

Abstract. The inclusion of a macroscopic adaptive threshold is studied for the retrieval dynamics of layered feedforward neural network models with synaptic noise. It is shown that if the threshold is chosen appropriately as a function of the cross-talk noise and of the activity of the stored patterns, adapting itself automatically in the course of the recall process, an autonomous functioning of the network is guaranteed. This self-control mechanism considerably improves the quality of retrieval, in particular the storage capacity, the basins of attraction and the mutual information content.

show abstract

Section: The Modelmentioning

confidence: 99%

“…This function depends on M 1 (t), q(t), a, α and β. The evolution of M 1 (t) and of q(t) (13), (14) depends on the specific choice of the threshold through the local field (2). We consider a layer independent threshold θ(t) = θ and calculate the value of (12) for fixed a, α, M 1 0 , q 0 and β.…”

Section: Adaptive Thresholdsmentioning

confidence: 99%

Adaptive Thresholds for Layered Neural Networks with Synaptic Noise

Bollé

Heylen

2006

Artificial Neural Networks – ICANN 2006

View full text Add to dashboard Cite

show abstract

“…However, for non-Gaussian signals, a principal components transformation may still result in coefficients with considerable statistical dependence. This realization has led to the exploration of techniques for extracting coefficients that are independent despite the non-Gaussian nature of the input [5][6][7][8]. Methods for representing non-Gaussian signals may be of particular importance for the neural processing of sensory information where there is considerable evidence that the outputs from neurons involved in early sensory processing are non-Gaussian [9][10][11].…”

Section: Introductionmentioning

confidence: 99%

“…However, when the transformed input is further transformed by a bounded invertible nonlinear function the amount of information conveyed is maximized by a factorial code. This was initially demonstrated for a single dimension [17], and later for multiple dimensions [5][6][7][8], where information maximization [18] was seen to lead to a factorial code. Although noise was initially considered to be vanishingly small, subsequent work [6,8] reached similar conclusions when noise was present, including the case where the noise was proportional to the mean of the input distribution [8].…”

Section: Introductionmentioning

confidence: 99%

Multiplicative neural noise can favor an independent components representation of sensory input

Gottschalk

Sexton

Roschke

2004

Network: Computation in Neural Systems

View full text Add to dashboard Cite

Arguments have been advanced to support the role of principal components (e.g., Karhunen-Loéve, eigenvector) and independent components transformations in early sensory processing, particularly for color and spatial vision. Although the concept of redundancy reduction has been used to justify a principal components transformation, these transformations per se do not necessarily confer benefits with respect to information transmission in information channels with additive independent identically distributed Gaussian noise. Here, it is shown that when a more realistic source of multiplicative neural noise is present in the information channel, there are quantitative benefits to a principal components or independent components representation for Gaussian and non-Gaussian inputs, respectively. Such a representation can convey a larger quantity of information despite the use of fewer spikes. The nature and extent of this benefit depend primarily on the probability distribution of the inputs and the relative power of the inputs. In the case of Gaussian input, the greater the disparity in power between dimensions, the greater the advantage of a principal components representation. For non-Gaussian input distributions with a kurtosis that is super-Gaussian, an independent components representation is similarly advantageous. This advantage holds even for input distributions with equal power since the resulting density is still rotationally asymmetric. However, sub-Gaussian input distributions can lead to situations where maximally correlated inputs are the most advantageous with respect to transmitting the greatest quantity of information with the fewest number of spikes.

show abstract

“…The basic idea is to treat the network as a channel of limited capacity and to adapt the parameters of the network to optimize the information transfer. Different optimality criteria exist, and a discussion of such criteria and some equivalences between them from the perspective of statistical physics is given in [1,2].In most cases, however, the information theoretic approach has led to useful learning algorithms only for single-layer networks. An exception is the proposal by Becker and Hinton (see [3], a review is given in [4]) which builds on ideas from computational linguistics.…”

mentioning

confidence: 99%

Statistical mechanics of mutual information maximization

Urbanczik¹

2000

Europhys. Lett.

View full text Add to dashboard Cite

theory and mathematical aspects. PACS. 05.50.+q -Lattice theory and statistics (Ising, Potts, etc.). PACS. 64.60.Cn -Order-disorder transformations; statistical mechanics of model systems.Abstract. -An unsupervised learning procedure based on maximizing the mutual information between the outputs of two networks receiving different but statistically dependent inputs is analyzed (Becker S. and Hinton G., Nature, 355 (1992) 161). By exploiting a formal analogy to supervised learning in parity machines, the theory of zero-temperature Gibbs learning for the unsupervised procedure is presented for the case that the networks are perceptrons and for the case of fully connected committees.It has long been realized that information theory can provide a useful conceptual framework for unsupervised learning in neural networks. The basic idea is to treat the network as a channel of limited capacity and to adapt the parameters of the network to optimize the information transfer. Different optimality criteria exist, and a discussion of such criteria and some equivalences between them from the perspective of statistical physics is given in [1,2].In most cases, however, the information theoretic approach has led to useful learning algorithms only for single-layer networks. An exception is the proposal by Becker and Hinton (see [3], a review is given in [4]) which builds on ideas from computational linguistics. They assume two different but statistically dependent modes of input, ξ 1 and ξ 2 , and each of these modes is processed by a different network. Due to the statistical dependency, some features of one input mode will be predictable given the other input mode, and the goal of training is to discover such mutually predictable features by maximizing the mutual information of the outputs of the two networks.It is worthwhile mentioning that in the context of sensory processing the scenario considered by Becker and Hinton is by no means artificial. For instance, simultaneous auditory (ξ 1 ) and visual (ξ 2 ) sensations are statistically dependent since they may be caused by the same object, and such dependences obviously provide useful information about the nature of the object. Statistical dependences also arise within one sensory system at different times: In speech the current phoneme (ξ 1 ) is to a certain extent predictable from the preceding ones (ξ 2 ) and the same phenomenon re-occurs at the level of words.In an application to vision, a simulation in [3] shows that two multilayer networks can learn higher-order features by maximizing the mutual information of their outputs. They learn to estimate the distance of an object from the stereo disparity which may arise when the object c EDP Sciences

show abstract

Nonlinear feedforward networks with stochastic outputs: infomax implies redundancy reduction

Cited by 26 publications

References 14 publications

Adaptive Thresholds for Layered Neural Networks with Synaptic Noise

Adaptive Thresholds for Layered Neural Networks with Synaptic Noise

Multiplicative neural noise can favor an independent components representation of sensory input

Statistical mechanics of mutual information maximization

Contact Info

Product

Resources

About