On the Convergence Rate of Incremental Aggregated Gradient Algorithms

Gürbüzbalaban, Mert; Ozdaglar, Asuman; Parrilo, Pablo A.

doi:10.1137/15m1049695

Cited by 112 publications

(127 citation statements)

References 22 publications

Supporting

Mentioning

125

Contrasting

Order By: Relevance

“…In particular, we show that in order to achieve an ǫ-optimal solution, the PIAG algorithm requires O(QK 2 log 2 (QK) log(1/ǫ)) iterations, or equivalently O(QK 2 log(1/ǫ)) iterations, where the tilde is used to hide the logarithmic terms in Q and K. This result improves upon the condition number dependence of the deterministic IAG for smooth problems [12], where the authors proved that to achieve an ǫ-optimal solution, the IAG algorithm requires O(Q 2 K 2 log(1/ǫ)) iterations. We also note that two recent independent papers [9,15] have analyzed the convergence rate of the prox-gradient algorithm (which is a special case of our algorithm with K = 0, i.e., where we have access to a full gradient at each iteration instead of an aggregated gradient) under strong convexity type assumptions and provided linear rate estimates.…”

mentioning

confidence: 75%

“…This is in contrast with the recent analysis of the IAG algorithm provided in [12], which used distances of the iterates to the optimal solution as a Lyapunov function and relied on the smoothness of the problem to bound the gradient errors with distances. This approach does not extend to the non-smooth composite case, which motivates a new analysis using function values and the properties of the proximal operator.…”

mentioning

confidence: 94%

“…The IG method processes the component functions one at a time by taking steps along the gradient of each individual function in a sequential manner, following a cyclic order [26,27] or a randomized order [13,22,27]. A particular randomized order, which at each iteration independently picks a component function uniformly at random from all component functions leads to the popular stochastic gradient descent (SGD) method.…”

mentioning

confidence: 99%

“…Blatt et al showed that under some assumptions, for a sufficiently small constant step size, the IAG method is globally convergent and when the component functions are quadratics, it achieves a linear rate. Two recent papers, [23] and [12], investigated the convergence rate of this method for general component functions that are convex and smooth (i.e., with Lipschitz gradients), where the sum of the component functions is strongly convex: In [23], the authors focused on a randomized version, called stochastic average gradient (SAG) method (which samples the component functions independently similar to SGD), and showed that it achieves a linear rate using a proof that relies on the stochastic nature of the algorithm. In a more recent work [12], the authors focused on deterministic IAG (i.e., component functions processed using an arbitrary deterministic order) and provided a simple analysis that uses a delayed dynamical system approach to study the evolution of the iterates generated by this algorithm.…”

mentioning

confidence: 99%

“…For instance, if the functions are processed in a cyclic order, we have K = m − 1 [13,26]. On the other hand, K = 0 corresponds to the case where we have the full gradient of the function f (x) at each iteration (i.e., g k = ∇f (x k )) and small K may represent a setting in which the gradients of the component functions are sent to a processor with some delay upper bounded by K.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Global convergence rate of incremental aggregated gradient methods for nonsmooth problems

Vanli

Gürbüzbalaban

Ozdaglar

2016

2016 IEEE 55th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Abstract. We focus on the problem of minimizing the sum of smooth component functions (where the sum is strongly convex) and a non-smooth convex function, which arises in regularized empirical risk minimization in machine learning and distributed constrained optimization in wireless sensor networks and smart grids. We consider solving this problem using the proximal incremental aggregated gradient (PIAG) method, which at each iteration moves along an aggregated gradient (formed by incrementally updating gradients of component functions according to a deterministic order) and taking a proximal step with respect to the non-smooth function. While the convergence properties of this method with randomized orders (in updating gradients of component functions) have been investigated, this paper, to the best of our knowledge, is the first study that establishes the convergence rate properties of the PIAG method for any deterministic order. In particular, we show that the PIAG algorithm is globally convergent with a linear rate provided that the step size is sufficiently small. We explicitly identify the rate of convergence and the corresponding step size to achieve this convergence rate. Our results improve upon the best known condition number dependence of the convergence rate of the incremental aggregated gradient methods used for minimizing a sum of smooth functions.

show abstract

mentioning

confidence: 75%

mentioning

confidence: 94%

mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

Global convergence rate of incremental aggregated gradient methods for nonsmooth problems

Vanli

Gürbüzbalaban

Ozdaglar

2016

2016 IEEE 55th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

show abstract

A distributed accelerated optimization algorithm over time‐varying directed graphs with uncoordinated step‐sizes

Ran

Wang

Zheng

et al. 2022

Optim Control Appl Methods

View full text Add to dashboard Cite

This article proposes a distributed accelerated algorithm for solving distributed optimization problems which are defined in a time-varying directed network. Different from the existing algorithms, by implementing row-and column-stochastic matrices, this work eliminates the conservatism in the related work due to doubly-stochastic matrices, and do not required estimate the Perron eigenvector of a stochastic matrix. Assuming that the global objective function is strongly convex and the gradient of each local objective function is Lipschitz-continuous, it is proved that the algorithm converges linearly to the global optimization solution with proper uncoordinated step-sizes and momentum parameters. The numerical simulations are utilized to verify the correctness of the theoretical results and show the practicability of the proposed algorithm.

show abstract

Inertial proximal incremental aggregated gradient method with linear convergence guarantees

2022

View full text Add to dashboard Cite

On the Convergence Rate of Incremental Aggregated Gradient Algorithms

Cited by 112 publications

References 22 publications

Global convergence rate of incremental aggregated gradient methods for nonsmooth problems

Global convergence rate of incremental aggregated gradient methods for nonsmooth problems

A distributed accelerated optimization algorithm over time‐varying directed graphs with uncoordinated step‐sizes

Inertial proximal incremental aggregated gradient method with linear convergence guarantees

Contact Info

Product

Resources

About