We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of training examples. The time complexity of our algorithm to find an approximate local minimum is even faster than that of gradient descent to find a critical point. Our algorithm applies to a general class of optimization problems including training a neural network and other non-convex objectives arising in machine learning.
First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored due to the high cost of computing the second-order information. In this paper we develop second-order stochastic methods for optimization problems in machine learning that match the per-iteration cost of gradient based methods, and in certain settings improve upon the overall running time over popular first-order methods. Furthermore, our algorithm has the desirable property of being implementable in time linear in the sparsity of the input data.
We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this problem. From a technical standpoint, this work generalizes upon previous work in two main aspects: our model allows for adversarial noise in the dynamics, and allows for general convex costs.
Adaptive regularization with cubics (ARC) is an algorithm for unconstrained, nonconvex optimization. Akin to the trust-region method, its iterations can be thought of as approximate, safe-guarded Newton steps. For cost functions with Lipschitz continuous Hessian, ARC has optimal iteration complexity, in the sense that it produces an iterate with gradient smaller than ε in O(1/ε 1.5 ) iterations. For the same price, it can also guarantee a Hessian with smallest eigenvalue larger than − √ ε. In this paper, we study a generalization of ARC to optimization on Riemannian manifolds. In particular, we generalize the iteration complexity results to this richer framework. Our central contribution lies in the identification of appropriate manifold-specific assumptions that allow us to secure these complexity guarantees both when using the exponential map and when using a general retraction. A substantial part of the paper is devoted to studying these assumptions-relevant beyond ARC-and providing user-friendly sufficient conditions for them. Numerical experiments are encouraging. Keywords Optimization on manifolds • Complexity • Lipschitz regularity • Cubic regularization • Newton's method Mathematics Subject Classification 90C26 Nonconvex programming global optimization • 53Z99 Applications of differential geometry to sciences and engineering • 90C53 Methods of quasi-Newton type • 65K05 Numerical mathematical programming methods Authors are listed alphabetically.
We provide improved convergence rates for constrained convex-concave min-max problems and monotone variational inequalities with higher-order smoothness. In min-max settings where the p th -order derivatives are Lipschitz continuous, we give an algorithm HigherOrderMirrorProx that achieves an iteration complexity of O(1/T p+1 2 ) when given access to an oracle for finding a fixed point of a p th -order equation. We give analogous rates for the weak monotone variational inequality problem. For p > 2, our results improve upon the iteration complexity of the first-order Mirror Prox method of Nemirovski [2004] and the second-order method of Monteiro and Svaiter [2012]. We further instantiate our entire algorithm in the unconstrained p = 2 case.
In this work, we present new simple and optimal algorithms for solving the variational inequality (VI) problem for p th -order smooth, monotone operators -a problem that generalizes convex optimization and saddle-point problems. Recent works (Bullins and Lai (2020), Lin and Jordan (2021), Jiang and Mokhtari ( 2022)) present methods that achieve a rate of O(ε −2/(p+1) ) for p ≥ 1, extending results by (Nemirovski ( 2004)) and (Monteiro and Svaiter ( 2012)) for p = 1, 2. A drawback to these approaches, however, is their reliance on a line search scheme. We provide the first p th -order method that achieves a rate of O(ε −2/(p+1) ). Our method does not rely on a line search routine, thereby improving upon previous rates by a logarithmic factor. Building on the Mirror Prox method of Nemirovski (2004), our algorithm works even in the constrained, non-Euclidean setting. Furthermore, we prove the optimality of our algorithm by constructing matching lower bounds. These are the first lower bounds for smooth MVIs beyond convex optimization for p > 1. This establishes a separation between solving smooth MVIs and smooth convex optimization, and settles the oracle complexity of solving p th -order smooth MVIs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.