2018
DOI: 10.48550/arxiv.1809.02864
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Online Adaptive Methods, Universality and Acceleration

Kfir Y. Levy,
Alp Yurtsever,
Volkan Cevher

Abstract: We present a novel method for convex unconstrained optimization that, without any modifications, ensures: (i) accelerated convergence rate for smooth objectives, (ii) standard convergence rate in the general (non-smooth) setting, and (iii) standard convergence rate in the stochastic optimization setting. To the best of our knowledge, this is the first method that simultaneously applies to all of the above settings.At the heart of our method is an adaptive learning rate rule that employs importance weights, in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 19 publications
0
6
0
Order By: Relevance
“…Recently, have analyzed a choice of adaptive stepsizes similar to the global stepsizes we consider, but their result in the convex setting requires the norm of the gradients strictly greater than zero. Levy et al (2018) propose an acceleration method with adaptive stepsizes which are also similar to our global ones, proving the Õ(1/T 2 ) convergence in the deterministic smooth case and Õ(1/ √ T ) in both general deterministic case and stochastic smooth case, but requiring a bounded-domain assumption.…”
Section: Related Workmentioning
confidence: 85%
“…Recently, have analyzed a choice of adaptive stepsizes similar to the global stepsizes we consider, but their result in the convex setting requires the norm of the gradients strictly greater than zero. Levy et al (2018) propose an acceleration method with adaptive stepsizes which are also similar to our global ones, proving the Õ(1/T 2 ) convergence in the deterministic smooth case and Õ(1/ √ T ) in both general deterministic case and stochastic smooth case, but requiring a bounded-domain assumption.…”
Section: Related Workmentioning
confidence: 85%
“…An alternative way which uses the norm of the current (sub)gradient to define the step-size was initiated probably by [219] and became very popular in stochastic optimization for machine learning after the paper [220]. On this avenue it was possible to obtain for ν ∈ {0, 1} universal accelerated optimization method [221] and universal methods for variational inequalities and saddle-point problems [222,223].…”
Section: Connection Between Accelerated Methods and Conditional Gradientmentioning
confidence: 99%
“…MAG utilizes an MLMC gradient estimation technique that effectively transduces bias into variance; that is, our gradient estimator enjoys a low bias, but exhibits a high variance that depends on the mixing time (see Lemma 3.1). To cope with this dependence, we then resort to AdaGrad (Duchi et al, 2011), which is known to implicitly adapt to the variance of the stochastic gradients (Levy et al, 2018). We describe MAG in Algorithm 1, and state its main convergence guarantee for convex objectives in Theorem 3.3.…”
Section: Mag: Combining Multi-level Monte Carlo Gradient Estimation W...mentioning
confidence: 99%