Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2016
DOI: 10.1073/pnas.1614734113
|View full text |Cite
|
Sign up to set email alerts
|

A variational perspective on accelerated methods in optimization

Abstract: Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings. Although many generalizations and extensions of Nesterov's original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration concept. In this paper, we study accelerated methods from a continuous-time perspective. We show that there is a Lagrangian functional that we call the Bregman Lagrangian, which generates a large class of accelerated methods in conti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

10
509
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 378 publications
(540 citation statements)
references
References 24 publications
10
509
0
Order By: Relevance
“…Another interesting class are algorithms that include an optimization step within them, such as selection of an optimal step, e.g., [44,46]. We can consider a bilevel formulation of the form…”
Section: Optimal Algorithms With Optimal Stepsmentioning
confidence: 99%
See 2 more Smart Citations
“…Another interesting class are algorithms that include an optimization step within them, such as selection of an optimal step, e.g., [44,46]. We can consider a bilevel formulation of the form…”
Section: Optimal Algorithms With Optimal Stepsmentioning
confidence: 99%
“…Note that although we have several arg min, they are not correlated, i.e., we do not obtain a n-level optimization problem but rather a bilevel problem with multiple lower level problems. With such a formulation we might be able to prove optimality of one of the three gradient methods described in [44] or discover alternatives. Additionally it would be possible to discover well-known methods such as the nonlinear conjugate gradient method [26], where the line search is described by the arg min operator, or even stochastic methods which often depend on the computation of the maximum expected value of some iterative random variable X (it) .…”
Section: Optimal Algorithms With Optimal Stepsmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, Jordan and Al. [19] Show that the Nesterov's momentum has strong relation with the partial differential equations. They show that the trajectory of NAG is corresponding to brachistochrone according to the variational method.…”
Section: The Variants Of Sgdmentioning
confidence: 99%
“…Indeed, the evaluation of the ratio (E[|∇ s J|]/E[|∇ θ J|]) revealed that the objective function is usually 10 times more sensitive with respect to scaling parameters than dynamic parameters ( Figure 2B). This indicates the presence of two separate timescales in the continuous representation of the optimization problem (Wibisono et al, 2016), which suggests that the optimization problem is stiff. As standard optimization methods correspond to explicit solving schemes of the continuous optimization problem, the stiffness could explain the problem encountered for the standard approach.…”
Section: Scalings Have a Pronounced Influence On The Objective Functimentioning
confidence: 99%