Abstract:Hierarchical SGD (H-SGD) has emerged as a new distributed SGD algorithm for multi-level communication networks. In H-SGD, before each global aggregation, workers send their updated local models to local servers for aggregations. Despite recent research efforts, the effect of local aggregation on global convergence still lacks theoretical understanding. In this work, we first introduce a new notion of "upward" and "downward" divergences. We then use it to conduct a novel analysis to obtain a worst-case converge… Show more
“…In FL, the participants work together to solve a finite-sum optimization problem with SGD, while in hierarchical FL (HFL), the hierarchical SGD (H-SGD) is adopted [7]. The main difference between SGD and H-SGD is that H-SGD requires several rounds of intermediate aggregation before global aggregation.…”
“…Here we introduce two assumptions that are important for the proof of convergence. The first one indicates the property of the loss function employed in our proposed BHFL framework, which has also been widely included in the existing studies [7], [17], [29]. The second ensures that the model updating process will not lead to a significant bias.…”
Section: Assumptionsmentioning
confidence: 99%
“…Liu et al [8] propose a client-edge-cloud HFL framework running with the Hier-FAVG aggregation algorithm and demonstrate that commu- nication efficiency can be improved by introducing the hierarchical architecture in FL. Wang et al [7] provide theoretical analysis about the convergence of HFL based on Stochastic Gradient Decent (SGD) and emphasize the importance of local aggregation before global aggregation. In [6], the focus is on protecting participants' privacy in HFL with flexible and decentralized control.…”
Section: Related Workmentioning
confidence: 99%
“…Hierarchical federated learning (HFL) provides a promising solution to the above challenge [4]- [7]. The basic idea is to conduct multiple intermediate aggregations at proxy servers (e.g., edge servers) before global aggregation on the central server.…”
Cloud-edge-device hierarchical federated learning (HFL) has been recently proposed to achieve communication-efficient and privacy-preserving distributed learning. However, there exist several critical challenges, such as the single point of failure and potential stragglers in both edge servers and local devices. To resolve these issues, we propose a decentralized and straggler-tolerant blockchain-based HFL (BHFL) framework. Specifically, a Raft-based consortium blockchain is deployed on edge servers to provide a distributed and trusted computing environment for global model aggregation in BHFL. To mitigate the influence of stragglers on learning, we propose a novel aggregation method, HieAvg, which utilizes the historical weights of stragglers to estimate the missing submissions. Furthermore, we optimize the overall latency of BHFL by jointly considering the constraints of global model convergence and blockchain consensus delay. Theoretical analysis and experimental evaluation show that our proposed BHFL based on HieAvg can converge in the presence of stragglers, which performs better than the traditional methods even when the loss function is non-convex and the data on local devices are non-independent and identically distributed (non-IID).
“…In FL, the participants work together to solve a finite-sum optimization problem with SGD, while in hierarchical FL (HFL), the hierarchical SGD (H-SGD) is adopted [7]. The main difference between SGD and H-SGD is that H-SGD requires several rounds of intermediate aggregation before global aggregation.…”
“…Here we introduce two assumptions that are important for the proof of convergence. The first one indicates the property of the loss function employed in our proposed BHFL framework, which has also been widely included in the existing studies [7], [17], [29]. The second ensures that the model updating process will not lead to a significant bias.…”
Section: Assumptionsmentioning
confidence: 99%
“…Liu et al [8] propose a client-edge-cloud HFL framework running with the Hier-FAVG aggregation algorithm and demonstrate that commu- nication efficiency can be improved by introducing the hierarchical architecture in FL. Wang et al [7] provide theoretical analysis about the convergence of HFL based on Stochastic Gradient Decent (SGD) and emphasize the importance of local aggregation before global aggregation. In [6], the focus is on protecting participants' privacy in HFL with flexible and decentralized control.…”
Section: Related Workmentioning
confidence: 99%
“…Hierarchical federated learning (HFL) provides a promising solution to the above challenge [4]- [7]. The basic idea is to conduct multiple intermediate aggregations at proxy servers (e.g., edge servers) before global aggregation on the central server.…”
Cloud-edge-device hierarchical federated learning (HFL) has been recently proposed to achieve communication-efficient and privacy-preserving distributed learning. However, there exist several critical challenges, such as the single point of failure and potential stragglers in both edge servers and local devices. To resolve these issues, we propose a decentralized and straggler-tolerant blockchain-based HFL (BHFL) framework. Specifically, a Raft-based consortium blockchain is deployed on edge servers to provide a distributed and trusted computing environment for global model aggregation in BHFL. To mitigate the influence of stragglers on learning, we propose a novel aggregation method, HieAvg, which utilizes the historical weights of stragglers to estimate the missing submissions. Furthermore, we optimize the overall latency of BHFL by jointly considering the constraints of global model convergence and blockchain consensus delay. Theoretical analysis and experimental evaluation show that our proposed BHFL based on HieAvg can converge in the presence of stragglers, which performs better than the traditional methods even when the loss function is non-convex and the data on local devices are non-independent and identically distributed (non-IID).
“…The authors are with the School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden (emails: seyaa,vjfodor @kth.se). arXiv:2403.01540v1 [cs.LG] 3 Mar 2024 aggregated at both the edge servers and the cloud server for example in [6], [21], [23], while gradient aggregation is applied on both levels in [7], [8]. The mix of gradient and model parameter aggregation is proposed in [19], [20], where gradient aggregation is performed at the intra-set iterations and model aggregation at the inter-set iterations.…”
This paper presents a novel hierarchical federated learning algorithm within multiple sets that incorporates quantization for communication-efficiency and demonstrates resilience to statistical heterogeneity. Unlike conventional hierarchical federated learning algorithms, our approach combines gradient aggregation in intra-set iterations with model aggregation in inter-set iterations. We offer a comprehensive analytical framework to evaluate its optimality gap and convergence rate, comparing these aspects with those of conventional algorithms. Additionally, we develop a problem formulation to derive optimal system parameters in a closed-form solution. Our findings reveal that our algorithm consistently achieves high learning accuracy over a range of parameters and significantly outperforms other hierarchical algorithms, particularly in scenarios with heterogeneous data distributions.
In this paper, we investigate the aggregated model quality maximization problem in hierarchical federated learning, the decision problem of which is proved NP-complete. We develop the mechanism MaxQ to maximize the sum of local model quality, which consists of two stages. In the first stage, an algorithm based on matching game theory is proposed to associate mobile devices with edge servers, which is proved able to achieve the stability and 1 2 -approximation ratio. In the second stage, we design an incentive mechanism based on contract theory to maximize the quality of models submitted by mobile devices to edge servers. Through thorough experiments, we analyse the performance of MaxQ and compare it with the existing mechanisms FAIR and EHFL, under different deep learning models ResNet18, ResNet50 and AlexNet, individually. It is found that the model quality can be improved by 8.20% and 7.81%, 10.47% and 11.87%, 10.98% and 11.97% under different models, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.