2000
DOI: 10.1162/089976600300015637
|View full text |Cite
|
Sign up to set email alerts
|

On “Natural” Learning and Pruning in Multilayered Perceptrons

Abstract: Several studies have s h o wn that natural gradient descent for on-line learning is much more e cient than standard gradient descent. In this paper, we d e r i v e natural gradients in a slightly di erent manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon.The Fisher matrix plays an important role in all these algorithms. The second half of the paper discusses a layered approximation of the F… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
32
0

Year Published

2001
2001
2013
2013

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 48 publications
(34 citation statements)
references
References 13 publications
0
32
0
Order By: Relevance
“…Its logarithm is (4) Hence, maximization of the conditional log-likelihood is equivalent to the minimization of the square error (5) The natural gradient learning can be described by the following updating rule: (6) where is the learning rate (7) is the inverse of the so-called Riemannian metric tensor (Fisher information matrix) of the space . It is defined as (8) where denotes expectation with respect to the density (2).…”
Section: Adaptive Methods Of Realizing Online Natural Gradient Leamentioning
confidence: 99%
See 2 more Smart Citations
“…Its logarithm is (4) Hence, maximization of the conditional log-likelihood is equivalent to the minimization of the square error (5) The natural gradient learning can be described by the following updating rule: (6) where is the learning rate (7) is the inverse of the so-called Riemannian metric tensor (Fisher information matrix) of the space . It is defined as (8) where denotes expectation with respect to the density (2).…”
Section: Adaptive Methods Of Realizing Online Natural Gradient Leamentioning
confidence: 99%
“…Inserting (5), the quantity can be written as (9) where denotes transposition of a vector or matrix. The problem with the natural gradient learning (6) is that the input-output density is generally unknown. Moreover you need to compute the inverse of this matrix.…”
Section: Adaptive Methods Of Realizing Online Natural Gradient Leamentioning
confidence: 99%
See 1 more Smart Citation
“…For applications to learning MLP networks, it was proposed [5] that the dependencies between weights in different layers would be ignored, making the approximate matrix G block diagonal and hence easier to invert, since only the separate blocks need to be inverted.…”
Section: Natural Gradient Descentmentioning
confidence: 99%
“…The concept of natural gradient has further been extended to more general classes of multidimensional regression and classification problems in [15]. An alternative derivation of the natural gradient is given in [16], together with the natural equivalent of batch learning, linked to LevenbergMarquardt optimization. Recently, the special case of learning for non-linear discriminant networks was improved by use of natural gradient in [17].…”
Section: Introductionmentioning
confidence: 99%