Towards accelerated rates for distributed optimization over time-varying networks

Rogozin, Alexander; Lukoshkin, Vladislav; Gasnikov, Alexander; Kovalev, D. Yu.; Shulgin, Egor

doi:10.48550/arxiv.2009.11069

Cited by 8 publications

(22 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…OPAPC , Accelerated Dual Ascent [Uribe et al, 2020, Alg. 3], APM-C [Li et al, 2018], Mudag [Ye et al, 2020a], Accelerated EXTRA [Li and Lin, 2020], DAccGD [Rogozin et al, 2020], and DPAG [Ye et al, 2020b]. L (resp.…”

Section: Contributionsmentioning

confidence: 99%

“…• Acceleration over mesh networks: Given the focus of this work, we comment next only distributed algorithms over mesh networks employing some form of acceleration and provably convergent-they are summarized in Table 1. Although substantially different-some are primal [Ye et al, 2020a, Ye et al, 2020b, Li and Lin, 2020, Rogozin et al, 2020 others are dual or penalty-based [Scaman et al, 2017, Uribe et al, 2020, Li et al, 2018 methods, and applicable to special instances of (P) (mainly with r = 0) and subject to special design constraints (e.g., positive semidefinite gossip matrix)-they all achieve linear convergence rate, with communication complexity scaling some with √ κ ℓ (κ ℓ = L mx /µ mn is the "local" condition number) and others with √ κ (κ = L/µ is the condition number of f ). Note that in general κ ≪ κ ℓ ; hence the latter group is preferable to the former.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Acceleration in Distributed Optimization under Similarity

Scutari¹,

Cao²,

Gasnikov³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be similar, due to statistical data similarity or otherwise. In order to reduce the number of communications to reach a solution accuracy, we proposed a preconditioned, accelerated distributed method. An(1−ρ) log 1/ε number of communications steps, where β/µ is the relative condition number between the global and local loss functions, and ρ characterizes the connectivity of the network. This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest. Numerical results show significant communication savings with respect to existing accelerated distributed schemes, especially when solving ill-conditioned problems.

show abstract

Section: Contributionsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Acceleration in Distributed Optimization under Similarity

Scutari¹,

Cao²,

Gasnikov³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…It was shown in [46] that to obtain ǫ-optimal solutions, the gradient computation complexity is lower bounded by O √ κ log 1 ǫ , and the communication complexity is lower bounded by O κ θ log 1 ǫ . To obtain better complexities, many accelerated decentralized gradient-type methods have been developed (e.g., [11,12,16,18,20,21,22,23,24,38,41,42,43,46,54,57,61,62]). There exist dual-based methods such as [46] that achieve optimal complexities.…”

mentioning

confidence: 99%

“…In this paper, we focus on dual-free methods or gradient-type methods only. Some algorithms, for instance [16,22,23,41,42,61,62], rely on inner loops to guarantee desirable convergence rates. However, inner loops place a larger communication burden [24,38] which may limit the applications of these methods, since communication has often been recognized as the major bottleneck in distributed or decentralized optimization.…”

mentioning

confidence: 99%

“…Gradient tracking (GT) is one of the most popular techniques used for developing accelerated decentralized optimization methods. Most of the existing GT-based accelerated methods rely on inner loops of multiple consensus steps or Chebyshev acceleration (CA) to reduce the consensus errors (the difference among the local variables of different agents) between consecutive steps of the outer loop; see for instance [16,22,23,41,42,61,62]. Among single-loop methods, Acc-DNGD-SC [38] achieves O κ 5/7 θ 3/2 log 1 ǫ gradient computation and communication complexities.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Optimal Gradient Tracking for Decentralized Optimization

Song

Shi

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we focus on solving the decentralized optimization problem of minimizing the sum of n objective functions over a multi-agent network. The agents are embedded in an undirected graph where they can only send/receive information directly to/from their immediate neighbors. Assuming smooth and strongly convex objective functions, we propose an Optimal Gradient Tracking (OGT) method that achieves the optimal gradient computation complexity O √ κ log 1 ǫ and the optimal communication complexity O

show abstract

Penalty-Based Method for Decentralized Optimization over Time-Varying Graphs

Rogozin

Gasnikov

2020

Optimization and Applications

Self Cite

View full text Add to dashboard Cite

We consider a distributed stochastic optimization problem that is solved by a decentralized network of agents with only local communication between neighboring agents. The goal of the whole system is to minimize a global objective function given as a sum of local objectives held by each agent. Each local objective is defined as an expectation of a convex smooth random function and the agent is allowed to sample stochastic gradients for this function. For this setting we propose the first accelerated (in the sense of Nesterov's acceleration) method that simultaneously attains optimal up to a logarithmic factor communication and oracle complexity bounds for smooth strongly convex distributed stochastic optimization. We also consider the case when the communication graph is allowed to vary with time and obtain complexity bounds for our algorithm, which are the first upper complexity bounds for this setting in the literature.

show abstract

Towards accelerated rates for distributed optimization over time-varying networks

Cited by 8 publications

References 22 publications

Acceleration in Distributed Optimization under Similarity

Acceleration in Distributed Optimization under Similarity

Optimal Gradient Tracking for Decentralized Optimization

Penalty-Based Method for Decentralized Optimization over Time-Varying Graphs

Contact Info

Product

Resources

About