2021
DOI: 10.48550/arxiv.2105.04851
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving the Transient Times for Distributed Stochastic Gradient Methods

Abstract: We consider the distributed optimization problem where n agents each possessing a local cost function, collaboratively minimize the average of the n cost functions over a connected network. Assuming stochastic gradient information is available, we study a distributed stochastic gradient algorithm, called exact diffusion with adaptive stepsizes (EDAS) adapted from the Exact Diffusion method [37] and NIDS [11] and perform a non-asymptotic convergence analysis. We not only show that EDAS asymptotically achieves t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
7
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 37 publications
2
7
0
Order By: Relevance
“…Simultaneously and independently, a recent work [41] has established a similar result to this work. However, [41] studies the transient stage of D 2 /Exact-Diffusion for the strongly-convex scenario only, but not for the generally-convex scenario.…”
Section: Related Worksupporting
confidence: 84%
See 1 more Smart Citation
“…Simultaneously and independently, a recent work [41] has established a similar result to this work. However, [41] studies the transient stage of D 2 /Exact-Diffusion for the strongly-convex scenario only, but not for the generally-convex scenario.…”
Section: Related Worksupporting
confidence: 84%
“…Simultaneously and independently, a recent work [41] has established a similar result to this work. However, [41] studies the transient stage of D 2 /Exact-Diffusion for the strongly-convex scenario only, but not for the generally-convex scenario. We also prove a lower bound of D-SGD with homogeneous dataset that shows that D 2 /Exact-Diffusion's dependence on network topology cannot be worse than D-SGD and always better under the heterogeneous setting.…”
Section: Related Worksupporting
confidence: 84%
“…where the second line follows from (15). In light of the convexity of h and Jensen's inequality, for all t ≥ 1 we have that h(z t+1 ) ≤ 1 n n i=1 h(z i t+1 ) and hence (53) implies…”
Section: B Proof Of Lemma 5 B1 Step 1: Descent Inequality For the Con...mentioning
confidence: 95%
“…For this smooth formulation, variants of decentralized stochastic gradient descent (DSGD), e.g., [4,26,52,70], admit simple implementations yet provide competitive practical performance against centralized methods in homogeneous environments like data centers. When the data distributions across the network become heterogeneous, the performance of DSGD in both practice and theory degrades significantly [15,39,57,59,68]. To address this issue, stochastic methods that are robust to heterogeneous data have been proposed, e.g., D2 [51] that is derived from primal-dual formulations [22,25,47,69] and GT-DSGD [29,63] that is based on gradient tracking [10,33,38,41,67].…”
Section: Literature Reviewmentioning
confidence: 99%
“…One line of research proposes new algorithms that are less sensitive to topologies. For example, [66,23,65,57,1] removed data heterogeneity with bias-correction techniques in [68,29,62,40,69], and [14,61,7,27] utilized periodic global averaging or multiple partial averaging steps. All these methods have improved topology dependence.…”
Section: Related Workmentioning
confidence: 99%