2019
DOI: 10.48550/arxiv.1902.05679
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization

Nhan H. Pham,
Lam M. Nguyen,
Dzung T. Phan
et al.

Abstract: We propose a new stochastic first-order algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al., 2017a) and consist of two steps: a proximal gradient and an averaging step making them different from existing nonconvex proximal-type algorithms. The algorithms only require an average smoothness assumption of the nonconvex objective term and additional bounded v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

4
49
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(53 citation statements)
references
References 24 publications
4
49
0
Order By: Relevance
“…The algorithms SARAH, prox-SARAH [19], SPIDER [11], SPIDERBoost [25] and SPIDER-M [28] use this gradient estimator.…”
Section: Arxiv:190601133v1 [Mathoc] 4 Jun 2019mentioning
confidence: 99%
See 1 more Smart Citation
“…The algorithms SARAH, prox-SARAH [19], SPIDER [11], SPIDERBoost [25] and SPIDER-M [28] use this gradient estimator.…”
Section: Arxiv:190601133v1 [Mathoc] 4 Jun 2019mentioning
confidence: 99%
“…2 However, there are notable exceptions that suggest biased algorithms are worth further consideration. Recently, [11,19,25,28] proved that algorithms using the SARAH gradient estimator achieve the oracle complexity lower bound of O √ n 2 for non-convex composite optimisation. For comparison, the best complexity proved for SAGA and SVRG in this setting is O n 2/3 2 .…”
Section: Arxiv:190601133v1 [Mathoc] 4 Jun 2019mentioning
confidence: 99%
“…1, which necessitates SARAH's 'non-divergent' assumption. A few works have identified this issue [16,18,21], but require an n-related mini-batch size or step size 2 . The proposed L2S bypasses this n-dependence by removing the inner loop of SARAH and computing snapshot gradients following a random schedule.…”
Section: Loopless Sarah For Convex Problemsmentioning
confidence: 99%
“…Among variance reduction algorithms, the distinct feature of SARAH [15,16] and its variants [18][19][20][21] is that they rely on a biased gradient estimator v t formed by recursively using stochastic gradients. SARAH performs comparably to SVRG for strongly convex ERM, but outperforms SVRG for nonconvex losses, while unlike SAGA, it does not require to store a gradient table.…”
Section: Introductionmentioning
confidence: 99%
“…Including the variants of SARAH also employed a constant step size [30], [34]. In addition, Pham et al [33] proposed proximal SARAH (ProxSARAH) for stochastic composite nonconvex optimization and showed that ProxSARAH works with new constant and adaptive step sizes, where the constant step size is much larger than existing methods, including proximal SVRG (ProxSVRG) schemes [35] in the single sample case and adaptive step-sizes are increasing along the inner iterations rather than diminishing as in stochastic proximal gradient descent methods. However, it is complicated to compute adaptive step size for ProxSARAH.…”
Section: Introductionmentioning
confidence: 99%