Finite-Sum Smooth Optimization with SARAH

Nguyen, Lam M.; Dijk, Marten van; Phan, Dzung T.; Nguyen, Phuong Ha; Weng, Tsui-Wei; Kalagnanam, Jayant

doi:10.48550/arxiv.1901.07648

Cited by 14 publications

(39 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The main difference between SRVRC and previous stochastic cubic regularization algorithms (Kohler and Lucchi, 2017;Xu et al, 2017;Zhou et al, 2018d,b;Wang et al, 2018b;Zhang et al, 2018a) is that SRVRC adapts new semi-stochastic gradient and semi-stochastic Hessian estimators, which are defined recursively and have smaller asymptotic variance. The use of such semi-stochastic gradient has been proved to help reduce the gradient complexity in first-order nonconvex finite-sum optimization for finding stationary points (Fang et al, 2018;Wang et al, 2018a;Nguyen et al, 2019). Our work takes one step further to apply it to Hessian, and we will later show that it helps reduce the gradient and Hessian complexities in second-order nonconvex finite-sum optimization for finding local minima.…”

Section: Algorithm Descriptionmentioning

confidence: 84%

“…Reddi et al (2016); Allen-Zhu and Hazan (2016) extended SVRG to noncovnex finite-sum optimization, which is able to converge to first-order stationary point with better gradient complexity than vanilla gradient descent. Fang et al (2018); Zhou et al (2018c); Wang et al (2018a); Nguyen et al (2019) further improved the gradient complexity for nonconvex finite-sum optimization to be (near) optimal.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Stochastic Recursive Variance-Reduced Cubic Regularization Methods

Zhou,

2019

Preprint

View full text Add to dashboard Cite

Stochastic Variance-Reduced Cubic regularization (SVRC) algorithms have received increasing attention due to its improved gradient/Hessian complexities (i.e., number of queries to stochastic gradient/Hessian oracles) to find local minima for nonconvex finite-sum optimization. However, it is unclear whether existing SVRC algorithms can be further improved. Moreover, the semi-stochastic Hessian estimator adopted in existing SVRC algorithms prevents the use of Hessian-vector product-based fast cubic subproblem solvers, which makes SVRC algorithms computationally intractable for high-dimensional problems. In this paper, we first present a Stochastic Recursive Variance-Reduced Cubic regularization method (SRVRC) using a recursively updated semi-stochastic gradient and Hessian estimators. It enjoys improved gradient and Hessian complexities to find an ( , √ )-approximate local minimum, and outperforms the state-of-the-art SVRC algorithms. Built upon SRVRC, we further propose a Hessian-free SRVRC algorithm, namely SRVRC free , which only needs O(n −2 ∧ −3 ) stochastic gradient and Hessian-vector product computations, where n is the number of component functions in the finite-sum objective and is the optimization precision. This outperforms the best-known result O( −3.5 ) achieved by stochastic cubic regularization algorithm proposed in Tripuraneni et al. (2018).

show abstract

Section: Algorithm Descriptionmentioning

confidence: 84%

Section: Introductionmentioning

confidence: 99%

Stochastic Recursive Variance-Reduced Cubic Regularization Methods

Zhou,

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…IFO calls [16]. Though obtaining a theoretically attractive IFO complexity, similar to other variance reduced methods, SARAH is not as successful as expected for training neural networks.…”

Section: Sarah For Nonconvex Problemsmentioning

confidence: 92%

“…As a result, the computational burden of GD is alleviated by stochastic gradients, while the gradient estimator variance can be also reduced using snapshot gradients. Members of the variance reduction family include those abbreviated as SDCA [5], SVRG [6][7][8], SAG [9], SAGA [10,11], MISO [12], S2GD [13], SCSG [14] and SARAH [15,16]. Most of these rely on the update x t+1 = x t − ηv t , where η is a constant step size and v t is a carefully designed gradient estimator that takes advantage of the snapshot gradient.…”

Section: Introductionmentioning

confidence: 99%

“…Among variance reduction algorithms, the distinct feature of SARAH [15,16] and its variants [18][19][20][21] is that they rely on a biased gradient estimator v t formed by recursively using stochastic gradients. SARAH performs comparably to SVRG for strongly convex ERM, but outperforms SVRG for nonconvex losses, while unlike SAGA, it does not require to store a gradient table.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the Convergence of SARAH and Beyond

Li¹,

Ma²,

Giannakis³

2019

Preprint

View full text Add to dashboard Cite

The main theme of this work is a unifying algorithm, abbreviated as L2S, that can deal with (strongly) convex and nonconvex empirical risk minimization (ERM) problems. It broadens a recently developed variance reduction method known as SARAH. L2S enjoys a linear convergence rate for strongly convex problems, which also implies the last iteration of SARAH's inner loop converges linearly. For convex problems, different from SARAH, L2S can afford step and mini-batch sizes not dependent on the data size n, and the complexity needed to guaranteeFor nonconvex problems on the other hand, the complexity is O(n + √ n/ ). Parallel to L2S there are a few side results. Leveraging an aggressive step size, D2S is proposed, which provides a more efficient alternative to L2S and SARAH-like algorithms. Specifically, D2S requires a reduced IFO complexity of O (n + κ) ln(1/ ) for strongly convex problems. Moreover, to avoid the tedious selection of the optimal step size, an automatic tuning scheme is developed, which obtains comparable empirical performance with SARAH using judiciously tuned step size.

show abstract

ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization

Pham,

Nguyen,

Phan

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a new stochastic first-order algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al., 2017a) and consist of two steps: a proximal gradient and an averaging step making them different from existing nonconvex proximal-type algorithms. The algorithms only require an average smoothness assumption of the nonconvex objective term and additional bounded variance assumption if applied to expectation problems. They work with both constant and adaptive step-sizes, while allowing single sample and mini-batches. In all these cases, we prove that our algorithms can achieve the best-known complexity bounds. One key step of our methods is new constant and adaptive step-sizes that help to achieve desired complexity bounds while improving practical performance. Our constant step-size is much larger than existing methods including proximal SVRG schemes in the single sample case. We also specify the algorithm to the non-composite case that covers existing state-ofthe-arts in terms of complexity bounds. Our update also allows one to trade-off between step-sizes and mini-batch sizes to improve performance. We test the proposed algorithms on two composite nonconvex problems and neural networks using several well-known datasets.

show abstract

Finite-Sum Smooth Optimization with SARAH

Cited by 14 publications

References 9 publications

Stochastic Recursive Variance-Reduced Cubic Regularization Methods

Stochastic Recursive Variance-Reduced Cubic Regularization Methods

On the Convergence of SARAH and Beyond

ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization

Contact Info

Product

Resources

About