2018
DOI: 10.48550/arxiv.1811.10105
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Inexact SARAH Algorithm for Stochastic Optimization

Abstract: We develop and analyze a variant of variance reducing stochastic gradient algorithm, known as SARAH [10], which does not require computation of the exact gradient. Thus this new method can be applied to general expectation minimization problems rather than only finite sum problems. While the original SARAH algorithm, as well as its predecessor, SVRG [2], require an exact gradient computation on each outer iteration, the inexact variant of SARAH (iSARAH), which we develop here, requires only stochastic gradient… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
16
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
7

Relationship

3
4

Authors

Journals

citations
Cited by 7 publications
(17 citation statements)
references
References 7 publications
1
16
0
Order By: Relevance
“…(i) Under either a Lipschitz continuity assumption on the stochastic gradient noise or smoothness and convexity assumption on the individual function, we show that ROOT-SGD achieves a nonasymptotic upper bound in terms of expected gradient norm squared that improves upon state-of-the-art results for smooth and strongly convex objective functions F by a logarithmic factor in its leading term (Theorem 1). When augmented with periodic restarting steps, our convergence rate matches the state-of-the-art rate [49] among variance-reduced stochastic approximation methods (Theorem 2).…”
Section: Summary Of Main Resultssupporting
confidence: 68%
See 2 more Smart Citations
“…(i) Under either a Lipschitz continuity assumption on the stochastic gradient noise or smoothness and convexity assumption on the individual function, we show that ROOT-SGD achieves a nonasymptotic upper bound in terms of expected gradient norm squared that improves upon state-of-the-art results for smooth and strongly convex objective functions F by a logarithmic factor in its leading term (Theorem 1). When augmented with periodic restarting steps, our convergence rate matches the state-of-the-art rate [49] among variance-reduced stochastic approximation methods (Theorem 2).…”
Section: Summary Of Main Resultssupporting
confidence: 68%
“…In the field of smooth and convex stochastic optimization, variance-reduced gradient methods represented by, but not limited to, SAG [54], SDCA [58], SVRG [30], SCSG [36], SAGA [15], SARAH [47] and Inexact SARAH [49] have been proposed to improve the theoretical convergence rate of (stochastic) gradient descent. Under self-concordance conditions, [21] provide function-value bounds for a variant of the SVRG algorithm that matches the asymptotic behavior of the empirical risk minimizer, while the corresponding nonasymptotic rates can have worse dependency on the condition number compared to SGD.…”
Section: Further Overview Of Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Variance reduction. Variance Reduction (VR) techniques are originally proposed to reduce variance in gra-dient estimation for stochastic gradient methods (Johnson and Zhang 2013;Defazio, Bach, and Lacoste-Julien 2014;Nguyen et al 2017;Fang et al 2018;Zhou, Xu, and Gu 2018;Nguyen, Scheinberg, and Takáč 2018). Several stochastic projection-free VR methods have been proposed for solving offline optimization problems (Hazan and Luo 2016;Reddi et al 2016;Mokhtari, Hassani, and Karbasi 2018;Shen et al 2019;Yurtsever, Sra, and Cevher 2019).…”
Section: Related Workmentioning
confidence: 99%
“…Although Nguyen et al [26] pointed out that SARAH uses a large constant step size than that of SVRG, the step size is still chosen by mentor. Including the variants of SARAH also employed a constant step size [30], [34]. In addition, Pham et al [33] proposed proximal SARAH (ProxSARAH) for stochastic composite nonconvex optimization and showed that ProxSARAH works with new constant and adaptive step sizes, where the constant step size is much larger than existing methods, including proximal SVRG (ProxSVRG) schemes [35] in the single sample case and adaptive step-sizes are increasing along the inner iterations rather than diminishing as in stochastic proximal gradient descent methods.…”
Section: Introductionmentioning
confidence: 99%