Accelerated Methods for Non-Convex Optimization

Carmon, Yair; Duchi, John C.; Hinder, Oliver; Sidford, Aaron

doi:10.48550/arxiv.1611.00756

Cited by 34 publications

(80 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Reference Oracle Iterations Simplicity Non-stochastic [1,6] Hessian-vector product Õ(log n/ It is worth highlighting that our gradient-descent based algorithm enjoys the following nice features:…”

Section: Settingmentioning

confidence: 99%

“…On the contrary, algorithms with nested loops often suffer from significant overheads in large scales, or introduce concerns with the setting of hyperparameters and numerical stability (see e.g. [1,6]), making them relatively hard to find practical implementations.…”

Section: Introductionmentioning

confidence: 99%

“…[3,24,29] demonstrate that an -approximate second-order stationary point can be find using gradients in Õ(log n/ 1.75 ) iterations. Their results are based on previous works [1,6] using Hessian-vector products and the observation that the Hessian-vector product can be approximated via the difference of two gradient queries. Hence, their implementations contain nested-loop structures with relatively large numbers of hyperparameters.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Escape saddle points by a simple gradient-descent based algorithm

Zhang¹,

2021

Preprint

View full text Add to dashboard Cite

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function f : R n → R, it outputs an -approximate second-order stationary point in Õ(log n/ 1.75 ) iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with Õ(log 4 n/ 2 ) or Õ(log 6 n/ 1.75 ) iterations, our algorithm is polynomially better in terms of log n and matches their complexities in terms of 1/ . For the stochastic setting, our algorithm outputs an -approximate second-order stationary point in Õ(log 2 n/ 4 ) iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in log n compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.

show abstract

Section: Settingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Escape saddle points by a simple gradient-descent based algorithm

Zhang¹,

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Momentum acceleration methods are used regularly in the convex setting, as well as in machine learning practical scenarios [50,87,50,11,68,15,32]. While momentum acceleration was previously studied in nonconvex programming setups, it mostly involve non-convex constraints with a convex objective function [52,53,49,97]; and generic non-convex settings but only considering with the question of whether momentum acceleration leads to fast convergence to a saddle point or to a local minimum, rather than to a global optimum [31,56,18,4].…”

Section: Related Workmentioning

confidence: 99%

Fast quantum state reconstruction via accelerated non-convex programming

Kim¹,

Kollias²,

Kalev³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose a new quantum state reconstruction method that combines ideas from compressed sensing, non-convex optimization, and acceleration methods. The algorithm, called Momentum-Inspired Factored Gradient Descent (MiFGD), extends the applicability of quantum tomography for larger systems. Despite being a non-convex method, MiFGD converges provably to the true density matrix at a linear rate, in the absence of experimental and statistical noise, and under common assumptions. With this manuscript, we present the method, prove its convergence property and provide Frobenius norm bound guarantees with respect to the true density matrix. From a practical point of view, we benchmark the algorithm performance with respect to other existing methods, in both synthetic and real experiments performed on an IBM's quantum processing unit. We find that the proposed algorithm performs orders of magnitude faster than state of the art approaches, with the same or better accuracy. In both synthetic and real experiments, we observed accurate and robust reconstruction, despite experimental and statistical noise in the tomographic data. Finally, we provide a ready-to-use code for state tomography of multi-qubit systems.

show abstract

“…By utilizing second-order information, one can obtain improved rate of convergence to approximate local minima. This includes approaches based on Nesterov and Polyak's cubic regularization [1,18,27], or first-order method with accelerated gradient method as a sub-solver for escaping saddle points [2].…”

Section: Introductionmentioning

confidence: 99%

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Deng¹,

Cheng²,

Lan³

2018

Preprint

View full text Add to dashboard Cite

Stochastic gradient descent (Sgd) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive gradient) methods are the two main techniques to improve the slow convergence of Sgd. While empirical studies have demonstrated potential advantages of combining these two techniques, it remains unknown whether these methods can achieve the optimal rate of convergence for stochastic optimization. In this paper, we present a new class of adaptive and accelerated stochastic gradient descent methods and show that they exhibit the optimal sampling and iteration complexity for stochastic optimization. More specifically, we show that diagonal scaling, initially designed to improve vanilla stochastic gradient, can be incorporated into accelerated stochastic gradient descent to achieve the optimal rate of convergence for smooth stochastic optimization. We also show that momentum, apart from being known to speed up the convergence rate of deterministic optimization, also provides us new ways of designing non-uniform and aggressive moving average schemes in stochastic optimization. Finally, we present some heuristics that help to implement adaptive accelerated stochastic gradient descent methods and to further improve their practical performance for machine learning and deep learning.

show abstract

Accelerated Methods for Non-Convex Optimization

Cited by 34 publications

References 30 publications

Escape saddle points by a simple gradient-descent based algorithm

Escape saddle points by a simple gradient-descent based algorithm

Fast quantum state reconstruction via accelerated non-convex programming

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Contact Info

Product

Resources

About