1964
DOI: 10.1016/0041-5553(64)90137-5
|View full text |Cite
|
Sign up to set email alerts
|

Some methods of speeding up the convergence of iteration methods

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
1,203
0
3

Year Published

2004
2004
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 2,007 publications
(1,338 citation statements)
references
References 1 publication
6
1,203
0
3
Order By: Relevance
“…Second-order methods also amplify steps in low-curvature directions, but instead of accumulating changes they reweight the update along each eigen-direction of the curvature matrix by the inverse of the associated curvature. And just as secondorder methods enjoy improved local convergence rates, Polyak (1964) showed that CM can considerably accelerate convergence to a local minimum, requiring p Rtimes fewer iterations than steepest descent to reach the same level of accuracy, where R is the condition number of the curvature at the minimum and µ is set to (…”
Section: Momentum and Nesterov's Accelerated Gradientmentioning
confidence: 99%
See 1 more Smart Citation
“…Second-order methods also amplify steps in low-curvature directions, but instead of accumulating changes they reweight the update along each eigen-direction of the curvature matrix by the inverse of the associated curvature. And just as secondorder methods enjoy improved local convergence rates, Polyak (1964) showed that CM can considerably accelerate convergence to a local minimum, requiring p Rtimes fewer iterations than steepest descent to reach the same level of accuracy, where R is the condition number of the curvature at the minimum and µ is set to (…”
Section: Momentum and Nesterov's Accelerated Gradientmentioning
confidence: 99%
“…The momentum method (Polyak, 1964), which we refer to as classical momentum (CM), is a technique for accelerating gradient descent that accumulates a velocity vector in directions of persistent reduction in the objective across iterations. Given an objective function f (✓) to be minimized, classical momentum is given by:…”
Section: Momentum and Nesterov's Accelerated Gradientmentioning
confidence: 99%
“…Polyak (1964), Rybashov (1969) and Tsypkin (1971)) proposed the use of dynamical systems to compute the solution of various optimization problems. Specifically, Polyak (1964) investigates the idea of using a dynamical system that represents a HBF, moving under Newtonian dynamics in a conservative force field. Later, a more detailed analysis of this system was carried out by Attouch et al (2000).…”
Section: Continuous Optimization Bpm and The Conjugate Gradient Algomentioning
confidence: 99%
“…To avoid overfitting, we used 2 regularization term [6] in both layers. Momentum method [13] was used when updating weights and biases. The following other hyper-parameters were selected for pretraining: learning rate in the first layer λ = 0.01, learning rate in the second layer λ = 0.001, initial momentum µ = 0.5, momentum after fifth epoch µ = 0.9, and weight penalty 2 = 2 · 10 −5 .…”
Section: Datasets and Experimentsmentioning
confidence: 99%
“…This work demonstrated that, under certain conditions, stochastic gradient descent can match the performance of Martens's Hessian-free optimizer. The conditions for these performance levels were the use of proper random initialization (which was SI in this case) and the use of momentum method [13] or Nesterov Accelerated Gradient [12] during training. Works cited above study various forms of random initialization for networks with no pretraining.…”
Section: Related Workmentioning
confidence: 99%