2017
DOI: 10.48550/arxiv.1711.10456
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

Abstract: Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods," provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior to GD in the nonconvex setting remains open. This paper studies a simple variant of AGD, and shows that it escapes saddle points and finds a second-order stationary point in Õ(1/ǫ 7/4 ) iterations, faster than the Õ(1/ǫ 2 ) iterations required by GD. To the best of our knowl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
78
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 61 publications
(79 citation statements)
references
References 9 publications
1
78
0
Order By: Relevance
“…Our main contribution is a simple, single-loop, and robust gradient-based algorithm that can find an -approximate second-order stationary point of a smooth, Hessian Lipschitz function f : R n → R. Compared to previous works [3,24,29] exploiting the idea of gradient-based Hessian power method, our algorithm has a single-looped, simpler structure and better numerical stability. Compared to the previous state-of-the-art results with single-looped structures by [21] and [19,20] using Õ(log 6 n/ 1.75 ) or Õ(log 4 n/ 2 ) iterations, our algorithm achieves a polynomial speedup in log n: Theorem 1 (informal). Our single-looped algorithm finds an -approximate second-order stationary point in Õ(log n/ 1.75 ) iterations.…”
Section: Introductionmentioning
confidence: 89%
See 4 more Smart Citations
“…Our main contribution is a simple, single-loop, and robust gradient-based algorithm that can find an -approximate second-order stationary point of a smooth, Hessian Lipschitz function f : R n → R. Compared to previous works [3,24,29] exploiting the idea of gradient-based Hessian power method, our algorithm has a single-looped, simpler structure and better numerical stability. Compared to the previous state-of-the-art results with single-looped structures by [21] and [19,20] using Õ(log 6 n/ 1.75 ) or Õ(log 4 n/ 2 ) iterations, our algorithm achieves a polynomial speedup in log n: Theorem 1 (informal). Our single-looped algorithm finds an -approximate second-order stationary point in Õ(log n/ 1.75 ) iterations.…”
Section: Introductionmentioning
confidence: 89%
“…A seminal work along this line was by Ge et al [11], which found an -approximate second-order stationary point satisfying (1) using only gradients in O(poly(n, 1/ )) iterations. This is later improved to be almost dimension-free Õ(log 4 n/ 2 ) in the follow-up work [19], 2 and the perturbed accelerated gradient descent algorithm [21] based on Nesterov's accelerated gradient descent [26] takes Õ(log 6 n/ 1.75 ) iterations. However, these results still suffer from a significant overhead in terms of log n. On the other direction, Refs.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations