A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates

Li, Zhi; Shi, Wei; Yan, Ming

doi:10.1109/tsp.2019.2926022

Cited by 196 publications

(196 citation statements)

References 61 publications

Supporting

Mentioning

193

Contrasting

Order By: Relevance

“…Although the update (3b) has two gradient evaluations, they are evaluated at successive iterates so EXTRA can easily be implemented with one gradient evaluation per iteration by storing the previous gradient in memory. Several additional linear-rate algorithms have since been proposed [9,13,16,21,36,38,39]. Each of these methods have updates similar to (3) in that they require agents to store the previous iterate and/or gradient in memory.…”

Section: Introductionmentioning

confidence: 99%

“…Other distributed optimization algorithms solve (1) but lie outside the scope of the present work. This includes algorithms involving dual decomposition [4,24,25], inexact dual methods [7], proximal algorithms [27], asynchronous algorithms [15,37], weakly convex cases [13,20,26], accelerated methods [20,33,34]. Although linear convergence rates were obtained for many of the algorithms cited above, each algorithm differs in the nature and strength of its convergence analysis guarantees.…”

Section: Introductionmentioning

confidence: 99%

“…Although linear convergence rates were obtained for many of the algorithms cited above, each algorithm differs in the nature and strength of its convergence analysis guarantees. For example, some works show (non-constructively) the existence of a linear rate [35] whereas others provide specific tuning recommendations with associated analytic rate bounds [13,26]. Numerical simulations are another means for comparing performance [33], but they run the risk of being misleading because algorithm performance often depends on the graph topology, choice of functions, hyperparameter tuning, or state initializations.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Analysis and Design of First-Order Distributed Optimization Algorithms Over Time-Varying Graphs

Sundararajan

Scoy

Lessard

2020

IEEE Trans. Control Netw. Syst.

View full text Add to dashboard Cite

This work concerns the analysis and design of distributed first-order optimization algorithms over time-varying graphs. The goal of such algorithms is to optimize a global function that is the average of local functions using only local computations and communications. Several different algorithms have been proposed that achieve linear convergence to the global optimum when the local functions are strongly convex. We provide a unified analysis that yields a worstcase linear convergence rate as a function of the condition number of the local functions, the spectral gap of the graph, and the parameters of the algorithm. The framework requires solving a small semidefinite program whose size is fixed; it does not depend on the number of local functions or the dimension of the domain. The result is a computationally efficient method for distributed algorithm analysis that enables the rapid comparison, selection, and tuning of algorithms. Finally, we propose a new algorithm, which we call SVL, that is easily implementable and achieves the fastest possible worst-case convergence rate among all algorithms in the family we considered. We support our theoretical analysis with numerical experiments that generate worst-case examples demonstrating the effectiveness of SVL.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Analysis and Design of First-Order Distributed Optimization Algorithms Over Time-Varying Graphs

Sundararajan

Scoy

Lessard

2020

IEEE Trans. Control Netw. Syst.

View full text Add to dashboard Cite

show abstract

“…Our focus is on the design of distributed algorithms for Problem (P) that provably converge at a linear rate. When G = 0, several distributed schemes have been proposed in the literature enjoying such a property; examples include EXTRA [1], AugDGM [2], NEXT [3], SONATA [4], [5], DIGing [6], NIDS [7], Exact Diffusion [8], MSDA [9], and the distributed algorithms in [10], [11], and [12]. When G = 0 results are scarce; to our knowledge, the only two schemes available in the literature achieving linear rate for (P) are SONATA [5] and the distributed proximal gradient algorithm [13].…”

Section: Introductionmentioning

confidence: 99%

“…Because of that, in general, they cannot achieve the rate of the centralized gradient algorithm (addressing thus Q2). Works partially addressing Q2 are the following: MSDA [9] uses multiple communication steps to achieve the lower complexity bound of (P) when G = 0; and the algorithms in [16] and [7] achieve linear rate and can adjust the number of communications performed at each iteration to match the rate of the centralized gradient descent. However it is not clear how to extend (if possible) these methods and their convergence analysis to the more general composite (i.e., G = 0) setting (P).…”

Section: Introductionmentioning

confidence: 99%

A Unified Contraction Analysis of a Class of Distributed Algorithms for Composite Optimization

Sun

Scutari

2019

2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)

View full text Add to dashboard Cite

We study distributed composite optimization over networks: agents minimize the sum of a smooth (strongly) convex functionthe agents' sum-utility-plus a nonsmooth (extended-valued) convex one. We propose a general algorithmic framework for such a class of problems and provide a unified convergence analysis leveraging the theory of operator splitting. Our results unify several approaches proposed in the literature of distributed optimization for special instances of our formulation. Distinguishing features of our scheme are: (i) when the agents' functions are strongly convex, the algorithm converges at a linear rate, whose dependencies on the agents' functions and the network topology are decoupled, matching the typical rates of centralized optimization; (ii) the step-size does not depend on the network parameters but only on the optimization ones; and (iii) the algorithm can adjust the ratio between the number of communications and computations to achieve the same rate of the centralized proximal gradient scheme (in terms of computations). This is the first time that a distributed algorithm applicable to composite optimization enjoys such properties.

show abstract

An Acceleration of Decentralized SGD Under General Assumptions with Low Stochastic Noise

Ekaterina¹,

Rogozin²

2021

Communications in Computer and Information Science

View full text Add to dashboard Cite

A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates

Cited by 196 publications

References 61 publications

Analysis and Design of First-Order Distributed Optimization Algorithms Over Time-Varying Graphs

Analysis and Design of First-Order Distributed Optimization Algorithms Over Time-Varying Graphs

A Unified Contraction Analysis of a Class of Distributed Algorithms for Composite Optimization

An Acceleration of Decentralized SGD Under General Assumptions with Low Stochastic Noise

Contact Info

Product

Resources

About