2019
DOI: 10.48550/arxiv.1906.02351
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Convergence of SARAH and Beyond

Abstract: The main theme of this work is a unifying algorithm, abbreviated as L2S, that can deal with (strongly) convex and nonconvex empirical risk minimization (ERM) problems. It broadens a recently developed variance reduction method known as SARAH. L2S enjoys a linear convergence rate for strongly convex problems, which also implies the last iteration of SARAH's inner loop converges linearly. For convex problems, different from SARAH, L2S can afford step and mini-batch sizes not dependent on the data size n, and the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 15 publications
(37 reference statements)
0
12
0
Order By: Relevance
“…The complexity bound in Theorem 1 improves at least the dependence over all other SARAH variants for convex problems in the literature. So far, the complexity of SARAH for finding Li et al, 2019a]. 2 Comparing now with the complexity of vanilla SARAH, the bound in Theorem 1 depends on the summation of n + 1 √ rather than their product.…”
Section: Sarah With a Single Regularizermentioning
confidence: 99%
See 1 more Smart Citation
“…The complexity bound in Theorem 1 improves at least the dependence over all other SARAH variants for convex problems in the literature. So far, the complexity of SARAH for finding Li et al, 2019a]. 2 Comparing now with the complexity of vanilla SARAH, the bound in Theorem 1 depends on the summation of n + 1 √ rather than their product.…”
Section: Sarah With a Single Regularizermentioning
confidence: 99%
“…It can be seen that we have η min = 1 m(L+λ) ≤ η (s) ≤ 1 mλ = η max . As directly analyzing SARAH require extra assumptions as in [Nguyen et al, 2017[Nguyen et al, , 2018, we will focus on L2S [Li et al, 2019a] equipping with (14).…”
Section: B5 Proof For Corollarymentioning
confidence: 99%
“…One idea is to judiciously evaluate a so-termed snapshot gradient ∇f (x s ), and use it as an anchor of the stochastic draws in subsequent iterations. Members of the variance reduction family include schemes abbreviated as SDCA [Shalev-Shwartz and Zhang, 2013], SVRG [Johnson and Zhang, 2013], SAG [Roux et al, 2012], SAGA [Defazio et al, 2014], MISO [Mairal, 2013], SARAH [Nguyen et al, 2017], and their variants [Konecnỳ and Richtárik, 2013, Lei et al, 2017, Li et al, 2019, Kovalev et al, 2019. Most of these algorithms rely on the update x k+1 = x k − ηv k , where η is a constant step size and v k is an algorithm-specific gradient estimate that takes advantage of the snapshot gradient.…”
Section: Introductionmentioning
confidence: 99%
“…Along with auto-tuned BB step sizes, this paper establishes that in order to obtain 'tune-free' SVRG and SARAH schemes, one must: i) develop novel types of gradient averaging adaptive to the chosen step size; and, ii) adjust the inner loop length along with step size as well. Averaging in iterative solvers with reduced variance gradient estimators is effected by the means of choosing the starting point of the next outer loop [Johnson and Zhang, 2013, Tan et al, 2016, Nguyen et al, 2017, Li et al, 2019. The types of averaging considered so far have been employed as tricks to simplify proofs, while in the algorithm itself only the last iteration is selected as starting point of the ensuing outer loop.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation