2021
DOI: 10.48550/arxiv.2112.02952
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient Regularization of Newton Method with Bregman Distances

Abstract: In this paper, we propose a first second-order scheme based on arbitrary non-Euclidean norms, incorporated by Bregman distances. They are introduced directly in the Newton iterate with regularization parameter proportional to the square root of the norm of the current gradient. For the basic scheme, as applied to the composite optimization problem, we establish the global convergence rate of the order O(k −2 ) both in terms of the functional residual and in the norm of subgradients. Our main assumption on the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 3 publications
(4 reference statements)
0
11
0
Order By: Relevance
“…Cartis et al [14], Gould et al [21] propose adaptive variants of cubic regularization for non-convex optimization. For convex optimization, Mishchenko [30] provides a simple adaptive scheme converging at rate O(t −2 ), followed by an improvement in its guarantee to O(t −3 ) [17]. For tensor methods, Jiang et al [25] proposes an adaptive regularization scheme for convex functions with Lipschitz-continuous p th derivatives which achieves the rate O(t −p−1 ).…”
Section: Additional Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Cartis et al [14], Gould et al [21] propose adaptive variants of cubic regularization for non-convex optimization. For convex optimization, Mishchenko [30] provides a simple adaptive scheme converging at rate O(t −2 ), followed by an improvement in its guarantee to O(t −3 ) [17]. For tensor methods, Jiang et al [25] proposes an adaptive regularization scheme for convex functions with Lipschitz-continuous p th derivatives which achieves the rate O(t −p−1 ).…”
Section: Additional Related Workmentioning
confidence: 99%
“…Even for the case second-order methods with Lipschitz-continuous Hessian, an adaptive scheme with optimal rate O(t −3.5 ) remained open prior to this work. Beyond removing the bisection from the MS framework, our key algorithmic techniques include directly considering quadratically-regularized Newton step (similar to [30,17] and different from [22,25]), and using the original MS approximation condition for selecting an appropriate regularization parameter, which is new in the context of adaptive methods. These techniques allow us to adapt to both the constant and order of Hessian Hölder continuity simultaneously.…”
Section: Additional Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, as noted in [33], the improvement in complexity has been obtained by trading the simple Newton step requiring only the solution of a single linear system for more complex or slower procedures, such as secular iterations, possibly using Lanczos preprocessing [6,8] (see also [12,Chapters 8 to 10]) or (conjugate-)gradient descent [29,4]. In the simpler context of convex problems, two recent papers [33,17] independently proposed another globalization technique. At an iterate x, the step s is computed as…”
Section: Introductionmentioning
confidence: 99%
“…Their proposed scheme achieves a global O(k −2 ) rate, where k is the iteration number. [28] follows a similar idea using the Bregman distances, and extends the idea to developing an accelerated variant of the scheme that achieves a O(k −3 ) rate.…”
Section: Introductionmentioning
confidence: 99%