Convergence properties of an Objective-Function-Free Optimization regularization algorithm, including an $\mathcal{O}(ε^{-3/2})$ complexity bound

Gratton, S.; Jerad, S.; Toint, Ph. L.

doi:10.48550/arxiv.2203.09947

Cited by 2 publications

(3 citation statements)

References 22 publications

(35 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Promising lines of for future work include inexact derivatives, estimating the regularization parameter without evaluating the objective function (as in [24]), stochastic variants and the handling of simple constraints such as bounds on the variables in the spirit of [12,Section 14.2].…”

Section: Discussionmentioning

confidence: 99%

“…For nonconvex optimization, these latter variants exhibit a worst-case O −3/2 complexity order to find an -first-order minimizer compared with the O −2 order of second-order trust-region methods [26], [12,Section 3.2]. Adaptive cubic regularization was later extended to handle inexact derivatives [40,41,2,1], probabilistic models [1,13], and even schemes in which the value of the objective function is never computed [24]. However, as noted in [33], the improvement in complexity has been obtained by trading the simple Newton step requiring only the solution of a single linear system for more complex or slower procedures, such as secular iterations, possibly using Lanczos preprocessing [6,8] (see also [12,Chapters 8 to 10]) or (conjugate-)gradient descent [29,4].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Yet another fast variant of Newton's method for nonconvex optimization

Gratton¹,

Jerad²,

Toint³

2023

Preprint

View full text Add to dashboard Cite

A second-order algorithm is proposed for minimizing smooth nonconvex functions that alternates between regularized Newton and negative curvature steps. In most cases, the Hessian matrix is regularized with the square root of the current gradient and an additional term taking moderate negative curvature into account, a negative curvature step being taken only exceptionnally. As a consequence, the proposed method only requires the solution of a single linear system at nearly all iterations. We establish that at most O | log | −3/2 evaluations of the problem's objective function and derivatives are needed for this algorithm to obtain an -approximate first-order minimizer, and at most O | log | −3 to obtain a second-order one. Initial numerical experiments with two variants of the new method are finally presented.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Yet another fast variant of Newton's method for nonconvex optimization

Gratton¹,

Jerad²,

Toint³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The second is that simplicity is achieved by avoiding the computation of the objective-function values and, most commonly, of other derivatives than gradients (hence their name). This in turn has made them very robust in the presence of noise on the function and its derivatives [22], an important feature when the problem is so large that these quantities can only be realistically estimated (typically by sampling) rather than calculated exactly. The context in which optimization is performed with computing function values is sometimes denoted by OFFO (Objective-Function-Free Optimization).…”

Section: Introductionmentioning

confidence: 99%

Multilevel Objective-Function-Free Optimization with an Application to Neural Networks Training

Gratton¹,

Kopanicakova²,

Toint³

2023

Preprint

View full text Add to dashboard Cite

A class of multi-level algorithms for unconstrained nonlinear optimization is presented which does not require the evaluation of the objective function. The class contains the momentum-less AdaGrad method as a particular (single-level) instance. The choice of avoiding the evaluation of the objective function is intended to make the algorithms of the class less sensitive to noise, while the multi-level feature aims at reducing their computational cost. The evaluation complexity of these algorithms is analyzed and their behaviour in the presence of noise is then illustrated in the context of training deep neural networks for supervised learning applications.

show abstract

Convergence properties of an Objective-Function-Free Optimization regularization algorithm, including an $\mathcal{O}(ε^{-3/2})$ complexity bound

Cited by 2 publications

References 22 publications

Yet another fast variant of Newton's method for nonconvex optimization

Yet another fast variant of Newton's method for nonconvex optimization

Multilevel Objective-Function-Free Optimization with an Application to Neural Networks Training

Contact Info

Product

Resources

About