On “Natural” Learning and Pruning in Multilayered Perceptrons

Heskes, Tom

doi:10.1162/089976600300015637

Cited by 48 publications

(34 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Its logarithm is (4) Hence, maximization of the conditional log-likelihood is equivalent to the minimization of the square error (5) The natural gradient learning can be described by the following updating rule: (6) where is the learning rate (7) is the inverse of the so-called Riemannian metric tensor (Fisher information matrix) of the space . It is defined as (8) where denotes expectation with respect to the density (2).…”

Section: Adaptive Methods Of Realizing Online Natural Gradient Leamentioning

confidence: 99%

“…Inserting (5), the quantity can be written as (9) where denotes transposition of a vector or matrix. The problem with the natural gradient learning (6) is that the input-output density is generally unknown. Moreover you need to compute the inverse of this matrix.…”

Section: Adaptive Methods Of Realizing Online Natural Gradient Leamentioning

confidence: 99%

“…Further in [1] Amari give an adaptive method of realizing natural gradient learning. Heskes [6] provided an interesting explanation of natural gradient based on an idea of [7]. The goals of this paper are summarized as follows.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Implementing Online Natural Gradient Learning: Problems and Solutions

Wan¹

2006

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

The online natural gradient learning is an efficient algorithm to resolve the slow learning speed and poor performance of the standard gradient descent method. However, there are several problems to implement this algorithm. In this paper, we proposed a new algorithm to solve these problems and then compared the new algorithm with other known algorithms for online learning, including Almeida-Langlois-Amaral-Plakhov algorithm (ALAP), Vario-eta, local adaptive learning rate and learning with momentum etc., using sample data sets from Proben1 and normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Services. The strong and weak points of these algorithms were analyzed and tested empirically. We found out that using the online training error as the criterion to determine whether the learning rate should be changed or not is not appropriate and our new algorithm has better performance than other existing online algorithms.

show abstract

Section: Adaptive Methods Of Realizing Online Natural Gradient Leamentioning

confidence: 99%

Section: Adaptive Methods Of Realizing Online Natural Gradient Leamentioning

confidence: 99%

See 1 more Smart Citation

Implementing Online Natural Gradient Learning: Problems and Solutions

Wan¹

2006

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

show abstract

“…For applications to learning MLP networks, it was proposed [5] that the dependencies between weights in different layers would be ignored, making the approximate matrix G block diagonal and hence easier to invert, since only the separate blocks need to be inverted.…”

Section: Natural Gradient Descentmentioning

confidence: 99%

A gradient-based algorithm competitive with variational Bayesian EM for mixture of Gaussians

Kuusela

Raiko

Honkela

et al. 2009

2009 International Joint Conference on Neural Networks

View full text Add to dashboard Cite

Abstract-While variational Bayesian (VB) inference is typically done with the so called VB EM algorithm, there are models where it cannot be applied because either the E-step or the M-step cannot be solved analytically. In 2007, Honkela et al. introduced a recipe for a gradient-based algorithm for VB inference that does not have such a restriction. In this paper, we derive the algorithm in the case of the mixture of Gaussians model. For the first time, the algorithm is experimentally compared to VB EM and its variant with both artificial and real data. We conclude that the algorithms are approximately as fast depending on the problem.

show abstract

“…The concept of natural gradient has further been extended to more general classes of multidimensional regression and classification problems in [15]. An alternative derivation of the natural gradient is given in [16], together with the natural equivalent of batch learning, linked to LevenbergMarquardt optimization. Recently, the special case of learning for non-linear discriminant networks was improved by use of natural gradient in [17].…”

Section: Introductionmentioning

confidence: 99%

Intrinsic plasticity via natural gradient descent with application to drift compensation

2013

View full text Add to dashboard Cite

This paper investigates the learning dynamics of intrinsic plasticity (IP), which is a learning rule to tune a neuron's activation function such that its output distribution becomes approximately exponentially distributed. The informationgeometric properties of intrinsic plasticity are analyzed and the improved natural gradient intrinsic plasticity (NIP) dynamics are evaluated for a variety of input distributions. Together with a further new modification of the IP rule, the high capability of NIP to cope with drift is demonstrated to have superior performance as compared to the standard gradient in experiments with synthetic and real world data.

show abstract

On “Natural” Learning and Pruning in Multilayered Perceptrons

Cited by 48 publications

References 13 publications

Implementing Online Natural Gradient Learning: Problems and Solutions

Implementing Online Natural Gradient Learning: Problems and Solutions

A gradient-based algorithm competitive with variational Bayesian EM for mixture of Gaussians

Intrinsic plasticity via natural gradient descent with application to drift compensation

Contact Info

Product

Resources

About