Acceleration techniques for the backpropagation algorithm

Silva, Fernando M.; Almeida, Luís B.

doi:10.1007/3-540-52255-7_32

Cited by 113 publications

(55 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Since the basic version of BP is sensitivity towards learning rate and momentum factor [15], several improvements were suggested by researchers: 1) a fast BP algorithm, called Quickpro was proposed in [66,67]; 2) a delta-bar technique and an acceleration technique was suggested for tuning BP learning rate η in [68] and in [69] respectively; and 3) a variant of BP, called resilient propagator (Rprop) was proposed in [70].…”

Section: Conventional Optimization Approachesmentioning

confidence: 99%

Metaheuristic design of feedforward neural networks: A review of two decades of research

Ojha

Abraham

Snel

2017

Engineering Applications of Artificial Intelligence

412

147

View full text Add to dashboard Cite

Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era.

show abstract

Section: Conventional Optimization Approachesmentioning

confidence: 99%

Metaheuristic design of feedforward neural networks: A review of two decades of research

Ojha

Abraham

Snel

2017

Engineering Applications of Artificial Intelligence

412

147

View full text Add to dashboard Cite

show abstract

“…We can then consider such penalties as ravines that are parallel to some axes. The so-called adaptive step size technique [9], which was originally proposed for accelerating the optimization procedure in neural networks learning, can then be exploited for optimization involving such penalties. Note that for a ravine in the objective function parallel to an axis, use of an appropriate individual step size is equivalent to re-scaling the ravine.…”

Section: A Simple Approach For Optimization Involving L 1 Penaltiesmentioning

confidence: 99%

ICA with Sparse Connections: Revisited

Zhang

Peng

Chan

et al. 2009

Independent Component Analysis and Signal Separation

View full text Add to dashboard Cite

Abstract. When applying independent component analysis (ICA), sometimes we expect the connections between the observed mixtures and the recovered independent components (or the original sources) to be sparse, to make the interpretation easier or to reduce the random effect in the results. In this paper we propose two methods to tackle this problem. One is based on adaptive Lasso, which exploits the L1 penalty with data-adaptive weights. We show the relationship between this method and the classic information criteria such as BIC and AIC. The other is based on optimal brain surgeon, and we show how its stopping criterion is related to the information criteria. This method produces the solution path of the transformation matrix, with different number of zero entries. These methods involve low computational loads. Moreover, in each method, the parameter controlling the sparsity level of the transformation matrix has clear interpretations. By setting such parameters to certain values, the results of the proposed methods are consistent with those produced by classic information criteria.

show abstract

“…The following strategies are usually suggested: (i) start with a small learning rate and increase it exponentially, if suc-cessive epochs reduce the error, or rapidly decrease it, if a significant error increase occurs [3,25], (ii) start with a small learning rate and increase it, if successive epochs keep gradient direction fairly constant, or rapidly decrease it, if the direction of the gradient varies greatly at each epoch [6], (iii) for each weight, an individual learning rate is given, which increases if the successive changes in the weights are in the same direction and decreases otherwise [10,15,17,22], and (iv) use a closed formula to calculate a common learning rate for all the weights at each iteration [9,12,16] or a different learning rate for each weight [7,13]. Note that all the above-mentioned strategies employ heuristic parameters in an attempt to enforce the decrease of the learning error at each iteration and to secure the converge of the training algorithm.…”

Section: Deterministic Learning Rate Adaptationmentioning

confidence: 99%

Learning Rate Adaptation in Stochastic Gradient Descent

Plagianakos

Magoulas

Vrahatis

2001

Nonconvex Optimization and Its Applications

View full text Add to dashboard Cite

The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization methods are employed, however, in several cases, significant training speed and alleviation of the local minima problem can be achieved when stochastic minimization methods are used. In this paper a method for adapting the learning rate in stochastic gradient descent is presented. The main feature of the proposed learning rate adaptation scheme is that it exploits gradientrelated information from the current as well as the two previous pattern presentations. This seems to provide some kind of stabilization in the value of the learning rate and helps the stochastic gradient descent to exhibit fast convergence and a high rate of success. Tests in various problems validate the above mentioned characteristics of the new algorithm.

show abstract

Acceleration techniques for the backpropagation algorithm

Cited by 113 publications

References 4 publications

Metaheuristic design of feedforward neural networks: A review of two decades of research

Metaheuristic design of feedforward neural networks: A review of two decades of research

ICA with Sparse Connections: Revisited

Learning Rate Adaptation in Stochastic Gradient Descent

Contact Info

Product

Resources

About