A variational perspective on accelerated methods in optimization

Wibisono, Andre; Wilson, Ashia C.; Jordan, Michael I.

doi:10.1073/pnas.1614734113

Cited by 378 publications

(540 citation statements)

References 24 publications

Supporting

Mentioning

509

Contrasting

Order By: Relevance

“…Another interesting class are algorithms that include an optimization step within them, such as selection of an optimal step, e.g., [44,46]. We can consider a bilevel formulation of the form…”

Section: Optimal Algorithms With Optimal Stepsmentioning

confidence: 99%

“…Note that although we have several arg min, they are not correlated, i.e., we do not obtain a n-level optimization problem but rather a bilevel problem with multiple lower level problems. With such a formulation we might be able to prove optimality of one of the three gradient methods described in [44] or discover alternatives. Additionally it would be possible to discover well-known methods such as the nonlinear conjugate gradient method [26], where the line search is described by the arg min operator, or even stochastic methods which often depend on the computation of the maximum expected value of some iterative random variable X (it) .…”

Section: Optimal Algorithms With Optimal Stepsmentioning

confidence: 99%

“…See also the recent embedding of discrete algorithms like Nesterov's scheme in continuous implementations [39,43]. It seems possible to consider the optimization of these continuous variants of the algorithms using similar formulations.…”

Section: Continuous Formmentioning

confidence: 99%

See 2 more Smart Citations

Optimal deterministic algorithm generation

2018

View full text Add to dashboard Cite

A formulation for the automated generation of algorithms via mathematical programming (optimization) is proposed. The formulation is based on the concept of optimizing within a parameterized family of algorithms, or equivalently a family of functions describing the algorithmic steps. The optimization variables are the parameters-within this family of algorithms-that encode algorithm design: the computational steps of which the selected algorithms consist. The objective function of the optimization problem encodes the merit function of the algorithm, e.g., the computational cost (possibly also including a cost component for memory requirements) of the algorithm execution. The constraints of the optimization problem ensure convergence of the algorithm, i.e., solution of the problem at hand. The formulation is described prototypically for algorithms used in solving nonlinear equations and in performing unconstrained optimization; the parametrized algorithm family considered is that of monomials in function and derivative evaluation (including negative powers). A prototype implementation in GAMS is provided along with illustrative results demonstrating cases for which well-known algorithms are shown to be optimal. The formulation is a 123J Glob Optim mixed-integer nonlinear program. To overcome the multimodality arising from nonconvexity in the optimization problem, a combination of brute force and general-purpose deterministic global algorithms is employed to guarantee the optimality of the algorithm devised. We then discuss several directions towards which this methodology can be extended, their scope and limitations.

show abstract

Section: Optimal Algorithms With Optimal Stepsmentioning

confidence: 99%

Section: Optimal Algorithms With Optimal Stepsmentioning

confidence: 99%

See 1 more Smart Citation

Optimal deterministic algorithm generation

2018

View full text Add to dashboard Cite

show abstract

“…Recently, Jordan and Al. [19] Show that the Nesterov's momentum has strong relation with the partial differential equations. They show that the trajectory of NAG is corresponding to brachistochrone according to the variational method.…”

Section: The Variants Of Sgdmentioning

confidence: 99%

The Frontier of SGD and its Variants in Machine Learning

Du¹

2017

2017 2nd International Conference on Mechatronics and Information Technology (ICMIT 2017)

View full text Add to dashboard Cite

Abstract. Numerical optimization is a classical field in operation research and computer science, which has been widely used in the areas such as physics and economics. Although, optimization algorithms have achieved great success for plenty of applications, handling the big data in the best fashion possible is a very inspiring and demanding challenge in the artificial intelligence era. Stochastic gradient descent (SGD) is pretty simple but surprisingly, highly effective in machine learning models, such as support vector machine (SVM) and deep neural network (DNN). Theoretically, the performance of SGD for convex optimization is well understood. But, for the non-convex setting, which is very common for the machine learning problems, to obtain the theoretical guarantee for SGD and its variants is still a standing problem. In the paper, we do a survey about the SGD and its variants such as Momentum, ADAM and SVRG, differentiate their algorithms and applications and present some recent breakthrough and open problems.

show abstract

“…Indeed, the evaluation of the ratio (E[|∇ s J|]/E[|∇ θ J|]) revealed that the objective function is usually 10 times more sensitive with respect to scaling parameters than dynamic parameters ( Figure 2B). This indicates the presence of two separate timescales in the continuous representation of the optimization problem (Wibisono et al, 2016), which suggests that the optimization problem is stiff. As standard optimization methods correspond to explicit solving schemes of the continuous optimization problem, the stiffness could explain the problem encountered for the standard approach.…”

Section: Scalings Have a Pronounced Influence On The Objective Functimentioning

confidence: 99%

Efficient parameterization of large-scale dynamic models based on relative measurements

Schmiester

Schälte

Froehlich

et al. 2019

Preprint

View full text Add to dashboard Cite

Motivation: Mechanistic models of biochemical reaction networks facilitate the quantitative understanding of biological processes and the integration of heterogeneous datasets.However, some biological processes require the consideration of comprehensive reaction networks and therefore large-scale models. Parameter estimation for such models poses great challenges, in particular when the data are on a relative scale.Results: Here, we propose a novel hierarchical approach combining (i) the efficient analytic evaluation of optimal scaling, offset, and error model parameters with (ii) the scalable evaluation of objective function gradients using adjoint sensitivity analysis. We evaluate the properties of the methods by parameterizing a pan-cancer ordinary differential equation model (>1000 state variables, >4000 parameters) using relative protein, phospho-protein and viability measurements. The hierarchical formulation improves optimizer performance considerably. Furthermore, we show that this approach allows estimating error model parameters with negligible computational overhead when no experimental estimates are available, providing an unbiased way to weight heterogeneous data. Overall, our hierarchical formulation is applicable to a wide range of models, and allows for the efficient parameterization of largescale models based on heterogeneous relative measurements.Contact: jan.hasenauer@helmholtz-muenchen.de Supplementary information: Supplementary information are available at bioRxiv online.Supplementary code and data are available online at

show abstract

A variational perspective on accelerated methods in optimization

Cited by 378 publications

References 24 publications

Optimal deterministic algorithm generation

Optimal deterministic algorithm generation

The Frontier of SGD and its Variants in Machine Learning

Efficient parameterization of large-scale dynamic models based on relative measurements

Contact Info

Product

Resources

About