2020
DOI: 10.1007/978-3-030-35502-9_9
|View full text |Cite
|
Sign up to set email alerts
|

Non-monotone Behavior of the Heavy Ball Method

Abstract: We focus on the solutions of second-order stable linear difference equations and demonstrate that their behavior can be non-monotone and exhibit peak effects depending on initial conditions. The results are applied to the analysis of the accelerated unconstrained optimization method -the Heavy Ball method. We explain non-standard behavior of the method discovered in practical applications. In addition, such non-monotonicity complicates the correct choice of the parameters in optimization methods. We propose to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…With the use of proper values of β 2 , we can realize the damping of oscillations related to the non-monotonic convergence of the method. This is typical for the case of κ ≫ 1 [13]. In Section 3, this will also be illustrated for the minimization of non-quadratic convex and non-convex functions.…”
Section: Equivalent Odementioning
confidence: 96%
See 2 more Smart Citations
“…With the use of proper values of β 2 , we can realize the damping of oscillations related to the non-monotonic convergence of the method. This is typical for the case of κ ≫ 1 [13]. In Section 3, this will also be illustrated for the minimization of non-quadratic convex and non-convex functions.…”
Section: Equivalent Odementioning
confidence: 96%
“…This ODE describes the descent process in the neighborhood of x * . As can be seen from ( 25), the presence of β 2 realized an additional damping of oscillations associated with non-monotone convergence of the HBM [13]. 4.…”
mentioning
confidence: 93%
See 1 more Smart Citation
“…for all k ≥ 1. Our choice of θ k differs from the existing ones; the existing complexity analyses [16,17,21,32,34,43] of HB prohibit θ k = 1. For example, Li and Lin [34] proposed…”
Section: Update Of Solutionsmentioning
confidence: 99%
“…Actually, fast error/noise accumulation is a typical drawback of accelerated SGD with small batchsizes [35]. Moreover, deterministic accelerated and momentum-based methods often have non-monotone behavior (see [5] and references therein). However, to some extent clipped-SSTM suffers from the first drawback less than SSTM and has comparable convergence rate with SSTM.…”
Section: Numerical Experimentsmentioning
confidence: 99%