Gradient Descent Finds the Cubic-Regularized Nonconvex Newton Step

Carmon, Yair; Duchi, John C.

doi:10.1137/17m1113898

Cited by 62 publications

(99 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A direct consequence of Assumption 1 is that for any x ∈ F, it holds that f (p σ (x)) ≤f σ (x) whenever σ ≥ L (see [24,Lemma 4]). This further implies that for all k ≥ 0, we can find a σ k ≤ 2L such that (8) holds. Indeed, if the Lipschitz constant L is known, we can let σ k = L. If not, by using a line search strategy that doubles σ k after each trial [24, Section 5.2], we can find a σ k ≤ 2L such that (8) holds.…”

Section: The Cubic Regularization Methodsmentioning

confidence: 91%

See 1 more Smart Citation

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition

Yue¹,

Zhou²,

So³

2019

SIAM J. Optim.

View full text Add to dashboard Cite

In this paper we consider the cubic regularization (CR) method for minimizing a twice continuously differentiable function. While the CR method is widely recognized as a globally convergent variant of Newton's method with superior iteration complexity, existing results on its local quadratic convergence require a stringent non-degeneracy condition. We prove that under a local error bound (EB) condition, which is much weaker a requirement than the existing non-degeneracy condition, the sequence of iterates generated by the CR method converges at least Q-quadratically to a second-order critical point. This indicates that adding a cubic regularization not only equips Newton's method with remarkable global convergence properties but also enables it to converge quadratically even in the presence of degenerate solutions. As a byproduct, we show that without assuming convexity, the proposed EB condition is equivalent to a quadratic growth condition, which could be of independent interest. To demonstrate the usefulness and relevance of our convergence analysis, we focus on two concrete nonconvex optimization problems that arise in phase retrieval and low-rank matrix recovery, respectively, and prove that with overwhelming probability, the sequence of iterates generated by the CR method for solving these two problems converges at least Q-quadratically to a global minimizer. We also present numerical results of the CR method when applied to solve these two problems to support and complement our theoretical development.

show abstract

Section: The Cubic Regularization Methodsmentioning

confidence: 91%

“…Such observation has led to the development of various efficient algorithms for finding p σ (x) in [10]. More recently, it is shown in [8] that the gradient descent method can also be applied to find p σ (x). For the global convergence of the CR method, we need the following assumption.…”

Section: The Cubic Regularization Methodsmentioning

confidence: 99%

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition

Yue¹,

Zhou²,

So³

2019

SIAM J. Optim.

View full text Add to dashboard Cite

show abstract

“…Here, L 2 is taken to be the Lipschitz constant of the Hessians (see Definition 10), so as to ensure that the objective function in (158) majorizes the true objective f (·). While the subproblem (158) is nonconvex and may have local minima, it can often be efficiently solved by minimizing an explicitly written univariate convex function [185, Section 5], or even by gradient descent [188].…”

Section: Hessian-based Algorithmsmentioning

confidence: 99%

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Chi

Chen

2019

IEEE Trans. Signal Process.

318

284

View full text Add to dashboard Cite

Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently.In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.

show abstract

“…where σ t is the cubic regularization parameter chosen for the current iteration. As in the case of TR, the major bottleneck of CR involved solving the sub-problem (2b), for which various techniques have been proposed, e.g., [1,4,8,9]. To the best of our knowledge, the use of such regularization, was first introduced in the pioneering work of [34], and subsequently further studied in the seminal works of [9,10,45].From the worst-case complexity point of view, CR has a better dependence on ǫ g compared to TR.…”

Section: Cubic Regularizationmentioning

confidence: 99%

Newton-type methods for non-convex optimization under inexact Hessian information

2019

View full text Add to dashboard Cite

We consider variants of trust-region and adaptive cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under certain condition on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve ǫ-approximate second-order optimality which have been shown to be tight. Our Hessian approximation condition offers a range of advantages as compared with the prior works and allows for direct construction of the approximate Hessian with a priori guarantees through various techniques, including randomized sampling methods. In this light, we consider the canonical problem of finite-sum minimization, provide appropriate uniform and non-uniform sub-sampling strategies to construct such Hessian approximations, and obtain optimal iteration complexity for the corresponding subsampled trust-region and adaptive cubic regularization methods.

show abstract

Gradient Descent Finds the Cubic-Regularized Nonconvex Newton Step

Cited by 62 publications

References 27 publications

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Newton-type methods for non-convex optimization under inexact Hessian information

Contact Info

Product

Resources

About