Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence

Pilancı, Mert; Wainwright, Martin J.

doi:10.1137/15m1021106

Cited by 171 publications

(189 citation statements)

References 29 publications

(55 reference statements)

Supporting

Mentioning

184

Contrasting

Order By: Relevance

“…Other methods for approximating the leverage scores are available [17][18][19], that do not require directly computing the singular vectors. We also note that a multitude of other efficient constructions of sketching matrices and iterative sketching algorithms have been studied in the literature [20][21][22][23].…”

Section: Leverage Score Sketchingmentioning

confidence: 99%

Weighted Gradient Coding with Leverage Score Sampling

Charalambides

Pilancı

Hero

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

A major hurdle in machine learning is scalability to massive datasets. Approaches to overcome this hurdle include compression of the data matrix and distributing the computations. Leverage score sampling provides a compressed approximation of a data matrix using an importance weighted subset. Gradient coding has been recently proposed in distributed optimization to compute the gradient using multiple unreliable worker nodes. By designing coding matrices, gradient coded computations can be made resilient to stragglers, which are nodes in a distributed network that degrade system performance. We present a novel weighted leverage score approach, that achieves improved performance for distributed gradient coding by utilizing an importance sampling.

show abstract

Section: Leverage Score Sketchingmentioning

confidence: 99%

Weighted Gradient Coding with Leverage Score Sampling

Charalambides

Pilancı

Hero

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…A fruitful line of research has focused on how to improve the asymptotic convergence rate as t → ∞ through preconditioning: a technique that involves approximating the unknown Hessian H.θ/ = ∇ 2 θ L.θ/ (see, for instance, Bordes et al (2009) and references therein). Utilizing the curvature information that is reflected by various efficient approximations of the Hessian matrix, stochastic quasi-Newton methods (Moritz et al, 2016;Byrd et al, 2016;Wang et al, 2017;Schraudolph et al, 2007;Mokhtari and Ribeiro, 2015;Becker and Fadili, 2012), Newton sketching or subsampled Newton methods (Pilanci and Wainwright, 2015;Xu et al, 2016;Berahas et al, 2017;Bollapragada et al, 2016) and stochastic approximation of the inverse Hessian via Taylor series expansion (Agarwal et al, 2017) have been proposed to strike a balance between convergence rate and per-iteration complexity.…”

Section: Relationships To the Literaturementioning

confidence: 99%

Statistical Inference for the Population Landscape via Moment-Adjusted Stochastic Gradients

Liang

2019

Journal of the Royal Statistical Society Series B: Statistical Methodology

View full text Add to dashboard Cite

Summary Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint informs us only how well the solution is approximated numerically but overlooks the sampling nature of the data. In contrast, recognizing the randomness in the data, statisticians are keen to provide uncertainty quantification, or confidence, for the solution obtained by using iterative optimization methods. The paper makes progress along this direction by introducing moment‐adjusted stochastic gradient descent: a new stochastic optimization method for statistical inference. We establish non‐asymptotic theory that characterizes the statistical distribution for certain iterative methods with optimization guarantees. On the statistical front, the theory allows for model misspecification, with very mild conditions on the data. For optimization, the theory is flexible for both convex and non‐convex cases. Remarkably, the moment adjusting idea motivated from ‘error standardization’ in statistics achieves a similar effect to acceleration in first‐order optimization methods that are used to fit generalized linear models. We also demonstrate this acceleration effect in the non‐convex setting through numerical experiments.

show abstract

“…Our sketching methods derived below are based on (2.6) and therefore they have the capacity to utilize curvature information. In particular, if the objective function is quadratic, our methods can be interpreted as novel extensions to more general optimization models of the recently introduced iterative Hessian sketch method for minimizing self-concordant objective functions [24]. The reader should also note that we can further relax the condition (2.6) and require smoothness of f with respect to any image space generated by the random matrix S. More precisely, it is sufficient to assume that for any sample S ∼ S there exists a positive semidefinite matrix M S such that M S is positive definite on ker(A) ∩ range(S) and the following inequality holds:…”

Section: Sufficient Conditions For Sketchingmentioning

confidence: 99%

“…Sketching is a very general framework that covers as a particular case the (block) coordinate descent methods [15] when the sketch matrix is given by sampling columns of the identity matrix. Sketching was used, with a big success, to either decrease the computation burden when evaluating the gradient in first order methods [22] or to avoid solving the full Newton direction in second order methods [24]. Another crucial advantage of sketching is that for structured problems it keeps the computation cost low, while preserving the amount of data brought from RAM to CPU as for full gradient or Newton methods, and consequently allows for better CPUs utilization on modern multi-core machines.…”

mentioning

confidence: 99%

“…In second order methods sketching was used to either decrease the computation cost when evaluating the full Hessian or to avoid solving the full Newton direction. In [24,3] a Newton sketch algorithm was proposed for unconstrained self-concordant minimization, which performs an approximate Newton step, wherein each iteration only a sub-sampled Hessian is used. This procedure significantly reduces the computation cost, and still guarantees superlinear convergence for self-concordant objective functions.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks

Necoara

Nesterov

Glineur

2017

J Optim Theory Appl

View full text Add to dashboard Cite

In this paper we develop random block coordinate gradient descent methods for minimizing large scale linearly constrained separable convex problems over networks. Since we have coupled constraints in the problem, we devise an algorithm that updates in parallel τ ≥ 2 (block) components per iteration. Moreover, for this method the computations can be performed in a distributed fashion according to the structure of the network. However, its complexity per iteration is usually cheaper than of the full gradient method when the number of nodes N in the network is large. We prove that for this method we obtain in expectation an ǫ-accurate solution in at most O( N τ ǫ ) iterations and thus the convergence rate depends linearly on the number of (block) components τ to be updated. For strongly convex functions the new method converges linearly. We also focus on how to choose the probabilities to make the randomized algorithm to converge as fast as possible and we arrive at solving a sparse SDP. Finally, we describe several applications that fit in our framework, in particular the convex feasibility problem. Numerically, we show that the parallel coordinate descent method with τ > 2 accelerates on its basic counterpart corresponding to τ = 2.

show abstract

Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence

Cited by 171 publications

References 29 publications

Weighted Gradient Coding with Leverage Score Sampling

Weighted Gradient Coding with Leverage Score Sampling

Statistical Inference for the Population Landscape via Moment-Adjusted Stochastic Gradients

Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks

Contact Info

Product

Resources

About