Inexact SARAH algorithm for stochastic optimization

Nguyen, Lam M.; Scheinberg, Katya; Takáč, Martin

doi:10.1080/10556788.2020.1818081

Cited by 27 publications

(22 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Variance reduction. Variance Reduction (VR) techniques are originally proposed to reduce variance in gra-dient estimation for stochastic gradient methods (Johnson and Zhang 2013;Defazio, Bach, and Lacoste-Julien 2014;Nguyen et al 2017;Fang et al 2018;Zhou, Xu, and Gu 2018;Nguyen, Scheinberg, and Takáč 2018). Several stochastic projection-free VR methods have been proposed for solving offline optimization problems (Hazan and Luo 2016;Reddi et al 2016;Mokhtari, Hassani, and Karbasi 2018;Shen et al 2019;Yurtsever, Sra, and Cevher 2019).…”

Section: Related Workmentioning

confidence: 99%

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

Xie

Shen

Zhang

et al. 2020

AAAI

View full text Add to dashboard Cite

This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-round computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-round computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts.

show abstract

Section: Related Workmentioning

confidence: 99%

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

Xie

Shen

Zhang

et al. 2020

AAAI

View full text Add to dashboard Cite

show abstract

“…where λ > 0 is a hyperparameter that balances the weight of the regularization term by its own numerical size, and the larger λ is set, the heavier the penalty on the weight. r(w) picks different forms depending on the effect, which include L1 parametrization, L2 parametrization [29], and L∼ ∞ parametrization. The most common form is the L2 parametrization, i.e., r(w) = w 2, which can be calculated using 2 w 2 1 + w 2 2 + w 2 n .…”

Section: Introductionmentioning

confidence: 99%

N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples

Pan¹,

Zheng²

2022

Computer Modeling in Engineering &Amp; Sciences

View full text Add to dashboard Cite

The machine learning model converges slowly and has unstable training since large variance by random using a sample estimate gradient in SGD. To this end, we propose a noise reduction method for Stochastic Variance Reduction gradient (SVRG), called N-SVRG, which uses small batches samples instead of all samples for the average gradient calculation, while performing an incremental update of the average gradient. In each round of iteration, a small batch of samples is randomly selected for the average gradient calculation, while the average gradient is updated by rounding of the past model gradients during internal iterations. By suitably reducing the batch size B, the memory storage as well as the number of iterations can be reduced. The experiments are compared with the state-of-the-art Mini-Batch SGD, AdaGrad, RMSProp, SVRG and SCSG, and it is demonstrated that N-SVRG outperforms SVRG and SASG, and is on par with SCSG. Finally, by exploring the relationship between the small values of different parameters n. B and k and the effectiveness of the algorithm, we prove that our N-SVRG algorithm has some stability and can achieve sufficient accuracy even in the case of small batch size. The advantages and disadvantages of various methods are experimentally compared, and the stability of N-SVRG is explored by parameter settings.

show abstract

“…Introduction. In the past, a variety of stochastic optimization schemes have been developed, e.g., [7,12,13,22,24], in the context of optimization problems in which the expected-value of a cost function j is minimized, i.e., (1.1) min…”

mentioning

confidence: 99%

“…To tackle these issues, a wide range of modified SG methods have been developed. For example, [7] uses a trust-region-type model to normalize the steplengths, whereas the iSARAH algorithm proposed in [13] combines an inner SG scheme with an outer (inexact) full gradient descent method.…”

mentioning

confidence: 99%

CSG: A stochastic gradient method for a wide class of optimization problems appearing in a machine learning or data-driven context

Pflug¹,

Grieshammer²,

Uihlein³

et al. 2021

Preprint

View full text Add to dashboard Cite

A recent article introduced the continuous stochastic gradient method (CSG) for the efficient solution of a class of stochastic optimization problems. While the applicability of known stochastic gradient type methods is typically limited to expected risk functions, no such limitation exists for CSG. This advantage stems from the computation of design dependent integration weights, allowing for optimal usage of available information and therefore stronger convergence properties. However, the nature of the formula used for these integration weights essentially limited the practical applicability of this method to problems in which stochasticity enters via a low-dimensional and sufficiently simple probability distribution. In this paper we significantly extend the scope of the CSG method by presenting alternative ways to calculate the integration weights. A full convergence analysis for this new variant of the CSG method is presented and its efficiency is demonstrated in comparison to more classical stochastic gradient methods by means of a number of problem classes relevant to stochastic optimization and machine learning.

show abstract

Inexact SARAH algorithm for stochastic optimization

Cited by 27 publications

References 5 publications

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples

CSG: A stochastic gradient method for a wide class of optimization problems appearing in a machine learning or data-driven context

Contact Info

Product

Resources

About