The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of the paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact). We provide a complexity analysis for this method based on the properties of the CG iteration and the quality of the Hessian approximation, and compare it with a method that employs a stochastic gradient iteration instead of the CG method. We report preliminary numerical results that illustrate the performance of inexact subsampled Newton methods on machine learning applications based on logistic regression.
In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the regular computation of full gradients, the proposed method reduces variance by increasing the sample size as needed. The decision to increase the sample size is governed by an inner product test that ensures that search directions are descent directions with high probability. We show that the inner product test improves upon the well known norm test, and can be used as a basis for an algorithm that is globally convergent on nonconvex functions and enjoys a global linear rate of convergence on strongly convex functions. Numerical experiments on logistic regression problems illustrate the performance of the algorithm.
Methods for distributed optimization have received significant attention in recent years owing to their wide applicability in various domains including machine learning, robotics and sensor networks. A distributed optimization method typically consists of two key components: communication and computation. More specifically, at every iteration (or every several iterations) of a distributed algorithm, each node in the network requires some form of information exchange with its neighboring nodes (communication) and the computation step related to a (sub)-gradient (computation). The standard way of judging an algorithm via only the number of iterations overlooks the complexity associated with each iteration. Moreover, various applications deploying distributed methods may prefer a different composition of communication and computation.Motivated by this discrepancy, in this work we propose an adaptive cost framework which adjusts the cost measure depending on the features of various applications. We present a flexible algorithmic framework, where communication and computation steps are explicitly decomposed to enable algorithm customization for various applications. We apply this framework to the well-known distributed gradient descent (DGD) method, and show that the resulting customized algorithms, which we call DGD t , NEAR-DGD t and NEAR-DGD + , compare favorably to their base algorithms, both theoretically and empirically. The proposed NEAR-DGD + algorithm is an exact first-order method where the communication and computation steps are nested, and when the number of communication steps is adaptively increased, the method converges to the optimal solution. We test the performance and illustrate the flexibility of the methods, as well as practical variants, on quadratic functions and classification problems that arise in machine learning, in terms of iterations, gradient evaluations, communications and the proposed cost framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.