2019
DOI: 10.48550/arxiv.1902.00247
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

Cong Fang,
Zhouchen Lin,
Tong Zhang

Abstract: In this paper, we give a sharp analysis 1 for Stochastic Gradient Descent (SGD) and prove that SGD is able to efficiently escape from saddle points and find an ( , O( 0.5 ))-approximate secondorder stationary point in Õ( −3.5 ) stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipschitz, and dispersive noise assumptions. This result subverts the classical belief that SGD requires at least O( −4 ) stochastic gradient c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(37 citation statements)
references
References 26 publications
0
37
0
Order By: Relevance
“…When Assumption C holds, the communication can probably be improved by the factor of ε − 1 /4 using techniques from Fang et al [2019], which achieve Õ(ε −3.5 ) convergence rate under Assumption C outperforming Õ(ε −4 ) from Jin et al [2021] by the factor of ε − 1 /2 . When balancing the terms in Theorems 3.3 and 3.4, the communication improvement will be the square root of this value.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…When Assumption C holds, the communication can probably be improved by the factor of ε − 1 /4 using techniques from Fang et al [2019], which achieve Õ(ε −3.5 ) convergence rate under Assumption C outperforming Õ(ε −4 ) from Jin et al [2021] by the factor of ε − 1 /2 . When balancing the terms in Theorems 3.3 and 3.4, the communication improvement will be the square root of this value.…”
Section: Discussionmentioning
confidence: 99%
“…There are also a number of algorithms designed for finite sum setting where f (x) = n i=1 f i (x) [Reddi et al, 2017, Allen-Zhu and Li, 2018, Fang et al, 2018, or in case when only stochastic gradients are available [Tripuraneni et al, 2018, Jin et al, 2021, including variance reduction techniques [Allen-Zhu, 2018, Fang et al, 2018]. The sharpest rates in these settings have been obtained by Fang et al [2018], Zhou and Gu [2019] and Fang et al [2019].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…• Can we achieve the polynomial speedup in log n for more advanced stochastic optimization algorithms with complexity Õ(poly(log n)/ 3.5 ) [2,3,9,28,30] or Õ(poly(log n)/ 3 ) [8, 33]?…”
Section: Settingmentioning
confidence: 99%