Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent

Allen-Zhu, Zeyuan; Orecchia, Lorenzo

doi:10.48550/arxiv.1407.1537

Cited by 45 publications

(90 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also both SCSG and SARAH are non-accelerated methods and thus also do not achieve the optimal convergence results. Therefore, much recent research effort has been devoted to the design of accelerated gradient methods (e.g., Nesterov, 2004;Lan, 2012;Allen-Zhu and Orecchia, 2014;Su et al, 2014;Lin et al, 2015;Allen-Zhu, 2017;Lan and Zhou, 2018;Lan et al, 2019;Li et al, 2020b). As can be seen from Table 1, for strongly convex finite-sum problems, existing accelerated methods such as RPDG (Lan and Zhou, 2015), Katyusha (Allen-Zhu, 2017) and Varag (Lan et al, 2019) are optimal since their convergence results are O n + nL µ log 1 matching the lower bound Ω n + nL µ log 1 given by Lan and Zhou (2015).…”

Section: Introductionmentioning

confidence: 99%

ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method

Li¹

2021

Preprint

View full text Add to dashboard Cite

We propose a novel accelerated variance-reduced gradient method called ANITA for finite-sum optimization. In this paper, we consider both general convex and strongly convex settings. In the general convex setting, ANITA achieves the convergence result O n min 1 + log 1 √ n , log √ n + nL , which improves the previous best result O n min{log 1 , log n} + nL given by Varag (Lan et al., 2019). In particular, for a very wide range of , i.e., ∈ (0,, where is the error tolerance f (xT ) − f * ≤ and n is the number of data samples, ANITA can achieve the optimal convergence result O n + nL matching the lower bound Ω n + nL provided by Woodworth and Srebro (2016). To the best of our knowledge, ANITA is the first accelerated algorithm which can exactly achieve this optimal result O n + nL for general convex finite-sum problems. In the strongly convex setting, we also show that ANITA can achieve the optimal convergence result O n + nL µ log 1 matching the lower bound Ω n + nL µ log 1 provided by Lan and Zhou (2015). Moreover, ANITA enjoys a simpler loopless algorithmic structure unlike previous accelerated algorithms such as Katyusha (Allen-Zhu, 2017) and Varag (Lan et al., 2019) where they use an inconvenient double-loop structure. Finally, the experimental results also show that ANITA converges faster than previous state-of-the-art Varag (Lan et al., 2019), validating our theoretical results and confirming the practical superiority of ANITA.

show abstract

Section: Introductionmentioning

confidence: 99%

ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method

Li¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Similar accelerated multi-step methods have also been investigated for solving non-smooth problems of the form (1), e.g., [6,18,38,49]. The great theoretical properties as well as empirical performance of such accelerated methods have prompted many authors to try to understand the underlying mechanism and the natural scope of the acceleration concept, e.g., physical momentum, relations to other first-order algorithms as well as geometrical and continuous-time dynamics point of view [1,10,19,27,30,46,50]. Most relevant to the present paper is the result of [1] in which an acceleration scheme can was designed by an appropriate linear coupling of the gradient and mirror descent steps to draw upon their complementary characteristics.…”

Section: Examplementioning

confidence: 99%

FLAG n' FLARE: Fast Linearly-Coupled Adaptive Gradient Methods

Cheng

Roosta

Palombo³

et al. 2016

Preprint

View full text Add to dashboard Cite

We consider first order gradient methods for effectively optimizing a composite objective in the form of a sum of smooth and, potentially, non-smooth functions. We present accelerated and adaptive gradient methods, called FLAG and FLARE, which can offer the best of both worlds. They can achieve the optimal convergence rate by attaining the optimal first-order oracle complexity for smooth convex optimization. Additionally, they can adaptively and non-uniformly re-scale the gradient direction to adapt to the limited curvature available and conform to the geometry of the domain. We show theoretically and empirically that, through the compounding effects of acceleration and adaptivity, FLAG and FLARE can be highly effective for many data fitting and machine learning applications.

show abstract

“…The intuition behind this algorithm puzzled researchers for decades, and many articles are devoted to understanding the underlying mechanism (Allen-Zhu and Orecchia, 2014;Defazio, 2019;Ahn, 2020) and the role of the small yet crucial modification 5 compared to HB (Flammarion and Bach, 2015;Lessard et al, 2016;Hu and Lessard, 2017). Notwithstanding the theoretical value of these contributions, they are arguably only of a descriptive nature and leave open more fundamental questions on the reason behind acceleration.…”

Section: Introductionmentioning

confidence: 99%

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization

Zhang,

Orvieto,

Daneshmand

et al. 2021

Preprint

View full text Add to dashboard Cite

Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers. In this literature, acceleration is often supposed to be linked to the quality of the integrator (accuracy, energy preservation, symplecticity). In this work, we propose a novel ordinary differential equation that questions this connection: both the explicit and the semi-implicit (a.k.a symplectic) Euler discretizations on this ODE lead to an accelerated algorithm for convex programming. Although semi-implicit methods are well-known in numerical analysis to enjoy many desirable features for the integration of physical systems, our findings show that these properties do not necessarily relate to acceleration.

show abstract

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent

Cited by 45 publications

References 18 publications

ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method

ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method

FLAG n' FLARE: Fast Linearly-Coupled Adaptive Gradient Methods

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization

Contact Info

Product

Resources

About