2020
DOI: 10.48550/arxiv.2009.11069
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards accelerated rates for distributed optimization over time-varying networks

Abstract: We study the problem of decentralized optimization over time-varying networks with strongly convex smooth cost functions. In our approach, nodes run a multi-step gossip procedure after making each gradient update, thus ensuring approximate consensus at each iteration, while the outer loop is based on accelerated Nesterov scheme. The algorithm achieves precision ε > 0 in O( √ κ g χ log 2 (1/ε))communication steps and O( √ κ g log(1/ε)) gradient computations at each node, where κ g is the global function number … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
22
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
6
1

Relationship

5
2

Authors

Journals

citations
Cited by 8 publications
(22 citation statements)
references
References 22 publications
0
22
0
Order By: Relevance
“…OPAPC , Accelerated Dual Ascent [Uribe et al, 2020, Alg. 3], APM-C [Li et al, 2018], Mudag [Ye et al, 2020a], Accelerated EXTRA [Li and Lin, 2020], DAccGD [Rogozin et al, 2020], and DPAG [Ye et al, 2020b]. L (resp.…”
Section: Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…OPAPC , Accelerated Dual Ascent [Uribe et al, 2020, Alg. 3], APM-C [Li et al, 2018], Mudag [Ye et al, 2020a], Accelerated EXTRA [Li and Lin, 2020], DAccGD [Rogozin et al, 2020], and DPAG [Ye et al, 2020b]. L (resp.…”
Section: Contributionsmentioning
confidence: 99%
“…• Acceleration over mesh networks: Given the focus of this work, we comment next only distributed algorithms over mesh networks employing some form of acceleration and provably convergent-they are summarized in Table 1. Although substantially different-some are primal [Ye et al, 2020a, Ye et al, 2020b, Li and Lin, 2020, Rogozin et al, 2020 others are dual or penalty-based [Scaman et al, 2017, Uribe et al, 2020, Li et al, 2018 methods, and applicable to special instances of (P) (mainly with r = 0) and subject to special design constraints (e.g., positive semidefinite gossip matrix)-they all achieve linear convergence rate, with communication complexity scaling some with √ κ ℓ (κ ℓ = L mx /µ mn is the "local" condition number) and others with √ κ (κ = L/µ is the condition number of f ). Note that in general κ ≪ κ ℓ ; hence the latter group is preferable to the former.…”
Section: Related Workmentioning
confidence: 99%
“…It was shown in [46] that to obtain ǫ-optimal solutions, the gradient computation complexity is lower bounded by O √ κ log 1 ǫ , and the communication complexity is lower bounded by O κ θ log 1 ǫ . To obtain better complexities, many accelerated decentralized gradient-type methods have been developed (e.g., [11,12,16,18,20,21,22,23,24,38,41,42,43,46,54,57,61,62]). There exist dual-based methods such as [46] that achieve optimal complexities.…”
mentioning
confidence: 99%
“…In this paper, we focus on dual-free methods or gradient-type methods only. Some algorithms, for instance [16,22,23,41,42,61,62], rely on inner loops to guarantee desirable convergence rates. However, inner loops place a larger communication burden [24,38] which may limit the applications of these methods, since communication has often been recognized as the major bottleneck in distributed or decentralized optimization.…”
mentioning
confidence: 99%
See 1 more Smart Citation