2020
DOI: 10.1080/10556788.2020.1818081
|View full text |Cite
|
Sign up to set email alerts
|

Inexact SARAH algorithm for stochastic optimization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
17
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(22 citation statements)
references
References 5 publications
0
17
0
1
Order By: Relevance
“…Variance reduction. Variance Reduction (VR) techniques are originally proposed to reduce variance in gra-dient estimation for stochastic gradient methods (Johnson and Zhang 2013;Defazio, Bach, and Lacoste-Julien 2014;Nguyen et al 2017;Fang et al 2018;Zhou, Xu, and Gu 2018;Nguyen, Scheinberg, and Takáč 2018). Several stochastic projection-free VR methods have been proposed for solving offline optimization problems (Hazan and Luo 2016;Reddi et al 2016;Mokhtari, Hassani, and Karbasi 2018;Shen et al 2019;Yurtsever, Sra, and Cevher 2019).…”
Section: Related Workmentioning
confidence: 99%
“…Variance reduction. Variance Reduction (VR) techniques are originally proposed to reduce variance in gra-dient estimation for stochastic gradient methods (Johnson and Zhang 2013;Defazio, Bach, and Lacoste-Julien 2014;Nguyen et al 2017;Fang et al 2018;Zhou, Xu, and Gu 2018;Nguyen, Scheinberg, and Takáč 2018). Several stochastic projection-free VR methods have been proposed for solving offline optimization problems (Hazan and Luo 2016;Reddi et al 2016;Mokhtari, Hassani, and Karbasi 2018;Shen et al 2019;Yurtsever, Sra, and Cevher 2019).…”
Section: Related Workmentioning
confidence: 99%
“…where λ > 0 is a hyperparameter that balances the weight of the regularization term by its own numerical size, and the larger λ is set, the heavier the penalty on the weight. r(w) picks different forms depending on the effect, which include L1 parametrization, L2 parametrization [29], and L∼ ∞ parametrization. The most common form is the L2 parametrization, i.e., r(w) = w 2, which can be calculated using 2 w 2 1 + w 2 2 + w 2 n .…”
Section: Introductionmentioning
confidence: 99%
“…Introduction. In the past, a variety of stochastic optimization schemes have been developed, e.g., [7,12,13,22,24], in the context of optimization problems in which the expected-value of a cost function j is minimized, i.e., (1.1) min…”
mentioning
confidence: 99%
“…To tackle these issues, a wide range of modified SG methods have been developed. For example, [7] uses a trust-region-type model to normalize the steplengths, whereas the iSARAH algorithm proposed in [13] combines an inner SG scheme with an outer (inexact) full gradient descent method.…”
mentioning
confidence: 99%