This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit expressions for step-size choices that guarantee convergence to the optimum, and bound the associated convergence factors. The expressions have an explicit dependence on the degree of asynchrony and recover classical results under synchronous operation. Simulations and implementations on commercial compute clouds validate our findings.
Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order O(1/ √ T )for general convex regularization functions, and the rate O(1/T ) for strongly convex regularization functions, whereT is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speedup in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure.
This paper presents a new incremental gradient algorithm for minimizing the average of a large number of smooth component functions based on delayed partial gradients. Even with a constant step size, which can be chosen independently of the maximum delay bound and the number of objective function components, the expected objective value is guaranteed to converge linearly to within some ball around the optimum. We derive an explicit expression that quantifies how the convergence rate depends on objective function properties and algorithm parameters such as step-size and the maximum delay. An associated upper bound on the asymptotic error reveals the trade-off between convergence speed and residual error. Numerical examples confirm the validity of our results.
Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order O(1/ √ T )for general convex regularization functions, and the rate O(1/T ) for strongly convex regularization functions, whereT is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speedup in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure.
The event-driven and elastic nature of serverless runtimes makes them a very efficient and cost-effective alternative for scaling up computations. So far, they have mostly been used for stateless, data parallel and ephemeral computations. In this work, we propose using serverless runtimes to solve generic, large-scale optimization problems. Specifically, we build a master-worker setup using AWS Lambda as the source of our workers, implement a parallel optimization algorithm to solve a regularized logistic regression problem, and show that relative speedups up to 256 workers and efficiencies above 70% up to 64 workers can be expected. We also identify possible algorithmic and system-level bottlenecks, propose improvements, and discuss the limitations and challenges in realizing these improvements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.