2019
DOI: 10.48550/arxiv.1902.04811
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

Chi Jin,
Praneeth Netrapalli,
Rong Ge
et al.

Abstract: Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a gap has arisen between theory and practice. Indeed, traditional analyses of GD and SGD show that both algorithms converge to stationary points efficiently. But these analyses do not take into account th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
62
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 21 publications
(65 citation statements)
references
References 20 publications
(36 reference statements)
3
62
0
Order By: Relevance
“…Our main contribution is a simple, single-loop, and robust gradient-based algorithm that can find an -approximate second-order stationary point of a smooth, Hessian Lipschitz function f : R n → R. Compared to previous works [3,24,29] exploiting the idea of gradient-based Hessian power method, our algorithm has a single-looped, simpler structure and better numerical stability. Compared to the previous state-of-the-art results with single-looped structures by [21] and [19,20] using Õ(log 6 n/ 1.75 ) or Õ(log 4 n/ 2 ) iterations, our algorithm achieves a polynomial speedup in log n: Theorem 1 (informal). Our single-looped algorithm finds an -approximate second-order stationary point in Õ(log n/ 1.75 ) iterations.…”
Section: Introductionmentioning
confidence: 89%
See 3 more Smart Citations
“…Our main contribution is a simple, single-loop, and robust gradient-based algorithm that can find an -approximate second-order stationary point of a smooth, Hessian Lipschitz function f : R n → R. Compared to previous works [3,24,29] exploiting the idea of gradient-based Hessian power method, our algorithm has a single-looped, simpler structure and better numerical stability. Compared to the previous state-of-the-art results with single-looped structures by [21] and [19,20] using Õ(log 6 n/ 1.75 ) or Õ(log 4 n/ 2 ) iterations, our algorithm achieves a polynomial speedup in log n: Theorem 1 (informal). Our single-looped algorithm finds an -approximate second-order stationary point in Õ(log n/ 1.75 ) iterations.…”
Section: Introductionmentioning
confidence: 89%
“…We further assume that the stochastic gradients are Lipschitz (or equivalently, the underlying functions are gradient-Lipschitz, see Assumption 2), which is also adopted in most of the existing works; see e.g. [8,19,20,34]. We demonstrate that a simple extended version of our algorithm takes O(log 2 n) iterations to detect a negative curvature direction using only stochastic gradients, and then obtain an Ω(1) function value decrease with high probability.…”
Section: Introductionmentioning
confidence: 97%
See 2 more Smart Citations
“…Motivated by recent work on escaping saddle points (Ge et al, 2015;Lee et al, 2016;Jin et al, 2019), one can show that SSGD algorithm equipped with the aforementioned artificial noise injection escapes from all saddle points, and hence the initialization condition ( 14) can be dropped. First, we generalize Assumption 2.1 for local convergence to the following for global convergence:…”
Section: Global Convergence Analysismentioning
confidence: 99%