The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2022
DOI: 10.1609/aaai.v36i8.20832
|View full text |Cite
|
Sign up to set email alerts
|

Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD

Abstract: Hierarchical SGD (H-SGD) has emerged as a new distributed SGD algorithm for multi-level communication networks. In H-SGD, before each global aggregation, workers send their updated local models to local servers for aggregations. Despite recent research efforts, the effect of local aggregation on global convergence still lacks theoretical understanding. In this work, we first introduce a new notion of "upward" and "downward" divergences. We then use it to conduct a novel analysis to obtain a worst-case converge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 21 publications
(9 citation statements)
references
References 10 publications
(13 reference statements)
0
9
0
Order By: Relevance
“…In FL, the participants work together to solve a finite-sum optimization problem with SGD, while in hierarchical FL (HFL), the hierarchical SGD (H-SGD) is adopted [7]. The main difference between SGD and H-SGD is that H-SGD requires several rounds of intermediate aggregation before global aggregation.…”
Section: Hierarchical Federated Learningmentioning
confidence: 99%
See 3 more Smart Citations
“…In FL, the participants work together to solve a finite-sum optimization problem with SGD, while in hierarchical FL (HFL), the hierarchical SGD (H-SGD) is adopted [7]. The main difference between SGD and H-SGD is that H-SGD requires several rounds of intermediate aggregation before global aggregation.…”
Section: Hierarchical Federated Learningmentioning
confidence: 99%
“…Here we introduce two assumptions that are important for the proof of convergence. The first one indicates the property of the loss function employed in our proposed BHFL framework, which has also been widely included in the existing studies [7], [17], [29]. The second ensures that the model updating process will not lead to a significant bias.…”
Section: Assumptionsmentioning
confidence: 99%
See 2 more Smart Citations
“…The authors are with the School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden (emails: seyaa,vjfodor @kth.se). arXiv:2403.01540v1 [cs.LG] 3 Mar 2024 aggregated at both the edge servers and the cloud server for example in [6], [21], [23], while gradient aggregation is applied on both levels in [7], [8]. The mix of gradient and model parameter aggregation is proposed in [19], [20], where gradient aggregation is performed at the intra-set iterations and model aggregation at the inter-set iterations.…”
Section: Introductionmentioning
confidence: 99%