2017
DOI: 10.1137/15m1021106
|View full text |Cite
|
Sign up to set email alerts
|

Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence

Abstract: We propose a randomized second-order method for optimization known as the Newton Sketch: it is based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian. For self-concordant functions, we prove that the algorithm has super-linear convergence with exponentially high probability, with convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities. Given a suitable initialization, similar guarantees also hold for stro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
184
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 171 publications
(189 citation statements)
references
References 29 publications
(55 reference statements)
1
184
0
Order By: Relevance
“…Other methods for approximating the leverage scores are available [17][18][19], that do not require directly computing the singular vectors. We also note that a multitude of other efficient constructions of sketching matrices and iterative sketching algorithms have been studied in the literature [20][21][22][23].…”
Section: Leverage Score Sketchingmentioning
confidence: 99%
“…Other methods for approximating the leverage scores are available [17][18][19], that do not require directly computing the singular vectors. We also note that a multitude of other efficient constructions of sketching matrices and iterative sketching algorithms have been studied in the literature [20][21][22][23].…”
Section: Leverage Score Sketchingmentioning
confidence: 99%
“…A fruitful line of research has focused on how to improve the asymptotic convergence rate as t → ∞ through preconditioning: a technique that involves approximating the unknown Hessian H.θ/ = ∇ 2 θ L.θ/ (see, for instance, Bordes et al (2009) and references therein). Utilizing the curvature information that is reflected by various efficient approximations of the Hessian matrix, stochastic quasi-Newton methods (Moritz et al, 2016;Byrd et al, 2016;Wang et al, 2017;Schraudolph et al, 2007;Mokhtari and Ribeiro, 2015;Becker and Fadili, 2012), Newton sketching or subsampled Newton methods (Pilanci and Wainwright, 2015;Xu et al, 2016;Berahas et al, 2017;Bollapragada et al, 2016) and stochastic approximation of the inverse Hessian via Taylor series expansion (Agarwal et al, 2017) have been proposed to strike a balance between convergence rate and per-iteration complexity.…”
Section: Relationships To the Literaturementioning
confidence: 99%
“…Our sketching methods derived below are based on (2.6) and therefore they have the capacity to utilize curvature information. In particular, if the objective function is quadratic, our methods can be interpreted as novel extensions to more general optimization models of the recently introduced iterative Hessian sketch method for minimizing self-concordant objective functions [24]. The reader should also note that we can further relax the condition (2.6) and require smoothness of f with respect to any image space generated by the random matrix S. More precisely, it is sufficient to assume that for any sample S ∼ S there exists a positive semidefinite matrix M S such that M S is positive definite on ker(A) ∩ range(S) and the following inequality holds:…”
Section: Sufficient Conditions For Sketchingmentioning
confidence: 99%
“…Sketching is a very general framework that covers as a particular case the (block) coordinate descent methods [15] when the sketch matrix is given by sampling columns of the identity matrix. Sketching was used, with a big success, to either decrease the computation burden when evaluating the gradient in first order methods [22] or to avoid solving the full Newton direction in second order methods [24]. Another crucial advantage of sketching is that for structured problems it keeps the computation cost low, while preserving the amount of data brought from RAM to CPU as for full gradient or Newton methods, and consequently allows for better CPUs utilization on modern multi-core machines.…”
mentioning
confidence: 99%
See 1 more Smart Citation