1998
DOI: 10.1088/0954-898x_9_2_004
|View full text |Cite
|
Sign up to set email alerts
|

Nonlinear feedforward networks with stochastic outputs: infomax implies redundancy reduction

Abstract: We prove that maximization of mutual information between the output and the input of a feedforward neural network leads to full redundancy reduction under the following sufficient conditions: (i) the input signal is a (possibly nonlinear) invertible mixture of independent components; (ii) there is no input noise; (iii) the activity of each output neuron is a (possibly) stochastic variable with a probability distribution depending on the stimulus through a deterministic function of the inputs (where both the pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
9
0

Year Published

2000
2000
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(10 citation statements)
references
References 14 publications
1
9
0
Order By: Relevance
“…In order to measure the retrieval quality of the recall process, we use the mutual information function [5,6,13,14]. In general, it measures the average amount of information that can be received by the user by observing the signal at the output of a channel [15,16].…”
Section: The Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…In order to measure the retrieval quality of the recall process, we use the mutual information function [5,6,13,14]. In general, it measures the average amount of information that can be received by the user by observing the signal at the output of a channel [15,16].…”
Section: The Modelmentioning
confidence: 99%
“…This function depends on M 1 (t), q(t), a, α and β. The evolution of M 1 (t) and of q(t) (13), (14) depends on the specific choice of the threshold through the local field (2). We consider a layer independent threshold θ(t) = θ and calculate the value of (12) for fixed a, α, M 1 0 , q 0 and β.…”
Section: Adaptive Thresholdsmentioning
confidence: 99%
“…However, for non-Gaussian signals, a principal components transformation may still result in coefficients with considerable statistical dependence. This realization has led to the exploration of techniques for extracting coefficients that are independent despite the non-Gaussian nature of the input [5][6][7][8]. Methods for representing non-Gaussian signals may be of particular importance for the neural processing of sensory information where there is considerable evidence that the outputs from neurons involved in early sensory processing are non-Gaussian [9][10][11].…”
Section: Introductionmentioning
confidence: 99%
“…However, when the transformed input is further transformed by a bounded invertible nonlinear function the amount of information conveyed is maximized by a factorial code. This was initially demonstrated for a single dimension [17], and later for multiple dimensions [5][6][7][8], where information maximization [18] was seen to lead to a factorial code. Although noise was initially considered to be vanishingly small, subsequent work [6,8] reached similar conclusions when noise was present, including the case where the noise was proportional to the mean of the input distribution [8].…”
Section: Introductionmentioning
confidence: 99%
“…The basic idea is to treat the network as a channel of limited capacity and to adapt the parameters of the network to optimize the information transfer. Different optimality criteria exist, and a discussion of such criteria and some equivalences between them from the perspective of statistical physics is given in [1,2].In most cases, however, the information theoretic approach has led to useful learning algorithms only for single-layer networks. An exception is the proposal by Becker and Hinton (see [3], a review is given in [4]) which builds on ideas from computational linguistics.…”
mentioning
confidence: 99%