2019
DOI: 10.48550/arxiv.1901.07648
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Finite-Sum Smooth Optimization with SARAH

Abstract: The total complexity (measured as the total number of gradient computations) of a stochastic firstorder optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function F (w) = 1 n n i=1 f i (w) has been proven to be at least Ω(where denotes the attained accuracy E[ ∇F ( w) 2 ] ≤ for the outputted approximation w [6]. In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm [14,15] and achieve total complexity th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
36
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 14 publications
(39 citation statements)
references
References 9 publications
3
36
0
Order By: Relevance
“…The main difference between SRVRC and previous stochastic cubic regularization algorithms (Kohler and Lucchi, 2017;Xu et al, 2017;Zhou et al, 2018d,b;Wang et al, 2018b;Zhang et al, 2018a) is that SRVRC adapts new semi-stochastic gradient and semi-stochastic Hessian estimators, which are defined recursively and have smaller asymptotic variance. The use of such semi-stochastic gradient has been proved to help reduce the gradient complexity in first-order nonconvex finite-sum optimization for finding stationary points (Fang et al, 2018;Wang et al, 2018a;Nguyen et al, 2019). Our work takes one step further to apply it to Hessian, and we will later show that it helps reduce the gradient and Hessian complexities in second-order nonconvex finite-sum optimization for finding local minima.…”
Section: Algorithm Descriptionmentioning
confidence: 84%
See 1 more Smart Citation
“…The main difference between SRVRC and previous stochastic cubic regularization algorithms (Kohler and Lucchi, 2017;Xu et al, 2017;Zhou et al, 2018d,b;Wang et al, 2018b;Zhang et al, 2018a) is that SRVRC adapts new semi-stochastic gradient and semi-stochastic Hessian estimators, which are defined recursively and have smaller asymptotic variance. The use of such semi-stochastic gradient has been proved to help reduce the gradient complexity in first-order nonconvex finite-sum optimization for finding stationary points (Fang et al, 2018;Wang et al, 2018a;Nguyen et al, 2019). Our work takes one step further to apply it to Hessian, and we will later show that it helps reduce the gradient and Hessian complexities in second-order nonconvex finite-sum optimization for finding local minima.…”
Section: Algorithm Descriptionmentioning
confidence: 84%
“…Reddi et al (2016); Allen-Zhu and Hazan (2016) extended SVRG to noncovnex finite-sum optimization, which is able to converge to first-order stationary point with better gradient complexity than vanilla gradient descent. Fang et al (2018); Zhou et al (2018c); Wang et al (2018a); Nguyen et al (2019) further improved the gradient complexity for nonconvex finite-sum optimization to be (near) optimal.…”
Section: Introductionmentioning
confidence: 99%
“…IFO calls [16]. Though obtaining a theoretically attractive IFO complexity, similar to other variance reduced methods, SARAH is not as successful as expected for training neural networks.…”
Section: Sarah For Nonconvex Problemsmentioning
confidence: 92%
“…As a result, the computational burden of GD is alleviated by stochastic gradients, while the gradient estimator variance can be also reduced using snapshot gradients. Members of the variance reduction family include those abbreviated as SDCA [5], SVRG [6][7][8], SAG [9], SAGA [10,11], MISO [12], S2GD [13], SCSG [14] and SARAH [15,16]. Most of these rely on the update x t+1 = x t − ηv t , where η is a constant step size and v t is a carefully designed gradient estimator that takes advantage of the snapshot gradient.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation