Inexact SARAH Algorithm for Stochastic Optimization

Nguyen, Lam M.; Scheinberg, Katya; Takáč, Martin

doi:10.48550/arxiv.1811.10105

Cited by 7 publications

(17 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(i) Under either a Lipschitz continuity assumption on the stochastic gradient noise or smoothness and convexity assumption on the individual function, we show that ROOT-SGD achieves a nonasymptotic upper bound in terms of expected gradient norm squared that improves upon state-of-the-art results for smooth and strongly convex objective functions F by a logarithmic factor in its leading term (Theorem 1). When augmented with periodic restarting steps, our convergence rate matches the state-of-the-art rate [49] among variance-reduced stochastic approximation methods (Theorem 2).…”

Section: Summary Of Main Resultssupporting

confidence: 68%

“…In the field of smooth and convex stochastic optimization, variance-reduced gradient methods represented by, but not limited to, SAG [54], SDCA [58], SVRG [30], SCSG [36], SAGA [15], SARAH [47] and Inexact SARAH [49] have been proposed to improve the theoretical convergence rate of (stochastic) gradient descent. Under self-concordance conditions, [21] provide function-value bounds for a variant of the SVRG algorithm that matches the asymptotic behavior of the empirical risk minimizer, while the corresponding nonasymptotic rates can have worse dependency on the condition number compared to SGD.…”

Section: Further Overview Of Related Workmentioning

confidence: 99%

“…More recent accelerated variants of SGD provide further improvements in convergence rate [39,57,2,35,32,34]. Additionally, a variety of recursive variance-reduced stochastic approximation methods [50,49,65,20,63] have been studied in the nonconvex stochastic optimization literature. These algorithms, as well as their hybrid siblings [14,62], achieve state-of-the-art convergence rates and in particular are faster than SGD under mild additional smoothness assumption on the stochastic gradients.…”

Section: Further Overview Of Related Workmentioning

confidence: 99%

See 2 more Smart Citations

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

Li,

Mou,

Wainwright

et al. 2020

Preprint

View full text Add to dashboard Cite

The theory and practice of stochastic optimization has focused on stochastic gradient descent (SGD) in recent years, retaining the basic first-order stochastic nature of SGD while aiming to improve it via mechanisms such as averaging, momentum, and variance reduction. Improvement can be measured along various dimensions, however, and it has proved difficult to achieve improvements both in terms of nonasymptotic measures of convergence rate and asymptotic measures of distributional tightness. In this work, we consider first-order stochastic optimization from a general statistical point of view, motivating a specific form of recursive averaging of past stochastic gradients. The resulting algorithm, which we refer to as Recursive One-Over-T SGD (ROOT-SGD), matches the state-of-the-art convergence rate among online variance-reduced stochastic approximation methods. Moreover, under slightly stronger distributional assumptions, the rescaled last-iterate of ROOT-SGD converges to a zero-mean Gaussian distribution that achieves near-optimal covariance.

show abstract

Section: Summary Of Main Resultssupporting

confidence: 68%

Section: Further Overview Of Related Workmentioning

confidence: 99%

Section: Further Overview Of Related Workmentioning

confidence: 99%

See 1 more Smart Citation

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

Li,

Mou,

Wainwright

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Variance reduction. Variance Reduction (VR) techniques are originally proposed to reduce variance in gra-dient estimation for stochastic gradient methods (Johnson and Zhang 2013;Defazio, Bach, and Lacoste-Julien 2014;Nguyen et al 2017;Fang et al 2018;Zhou, Xu, and Gu 2018;Nguyen, Scheinberg, and Takáč 2018). Several stochastic projection-free VR methods have been proposed for solving offline optimization problems (Hazan and Luo 2016;Reddi et al 2016;Mokhtari, Hassani, and Karbasi 2018;Shen et al 2019;Yurtsever, Sra, and Cevher 2019).…”

Section: Related Workmentioning

confidence: 99%

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

Xie

Shen

Zhang

et al. 2019

Preprint

View full text Add to dashboard Cite

This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-iteration computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts.

show abstract

“…Although Nguyen et al [26] pointed out that SARAH uses a large constant step size than that of SVRG, the step size is still chosen by mentor. Including the variants of SARAH also employed a constant step size [30], [34]. In addition, Pham et al [33] proposed proximal SARAH (ProxSARAH) for stochastic composite nonconvex optimization and showed that ProxSARAH works with new constant and adaptive step sizes, where the constant step size is much larger than existing methods, including proximal SVRG (ProxSVRG) schemes [35] in the single sample case and adaptive step-sizes are increasing along the inner iterations rather than diminishing as in stochastic proximal gradient descent methods.…”

Section: Introductionmentioning

confidence: 99%

Accelerating Mini-batch SARAH by Step Size Rules

Yang

Chen²,

Wang

2019

Preprint

View full text Add to dashboard Cite

StochAstic Recursive grAdient algoritHm (SARAH), originally proposed for convex optimization and also proven to be effective for general nonconvex optimization, has received great attention due to its simple recursive framework for updating stochastic gradient estimates. The performance of SARAH significantly depends on the choice of step size sequence. However, SARAH and its variants often employ a best-tuned step size by mentor, which is time consuming in practice. Motivated by this gap, we proposed a variant of the Barzilai-Borwein (BB) method, referred to as the Random Barzilai-Borwein (RBB) method, to calculate step size for SARAH in the mini-batch setting, thereby leading to a new SARAH method: MB-SARAH-RBB. We prove that MB-SARAH-RBB converges linearly in expectation for strongly convex objective functions. We analyze the complexity of MB-SARAH-RBB and show that it is better than the original method. Numerical experiments on standard data sets indicate that MB-SARAH-RBB outperforms or matches state-of-the-art algorithms.

show abstract

Inexact SARAH Algorithm for Stochastic Optimization

Cited by 7 publications

References 7 publications

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

Accelerating Mini-batch SARAH by Step Size Rules

Contact Info

Product

Resources

About