FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity

Gong, Yonghai; Li, Yichuan; Freris, Nikolaos M.

doi:10.1109/icde53745.2022.00238

Cited by 17 publications

(9 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While some limited works, e.g. [6], [7], [19], [23], have derived convergence results for federated learning algorithms without relying on data similarity assumptions, their results are confined to specific algorithms with fixed step sizes and cannot be extended to analyze other federated algorithms. Notably, our work makes a significant contribution by expanding the results in [6], [7], which only cover (strongly) convex problems, and by generalizing the results in [23], which requires restrictive assumptions on the Lipschitz continuity of the Hessian and on the bounded 4 th -moment of the variance, i.e.…”

Section: Prior Workmentioning

confidence: 99%

“…Similarly, SCAFFOLD [16], FedSplit [17], and FedPD [18] harness variance reduction, operator splitting, and ADMM techniques respectively. FedPD was later refined into FedADMM [19] to expedite convergence.…”

Section: Introductionmentioning

confidence: 99%

“…In addition, many of the results only apply when the data similarity is high enough. Without these assumptions, the convergence of FedAvg for (strongly-) convex problems and FedADMM for non-convex problems are shown by [6] and [19], respectively. Their existing proof techniques and results cannot be applied to other federated learning algorithms, and they are limited to fixed step size strategies.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Comparing NARS and Reinforcement Learning: An Analysis of ONA and Q-Learning Algorithms

Beikmohammadi

Magnússon

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Data similarity assumptions have traditionally been relied upon to understand the convergence behaviors of federated learning methods. Unfortunately, this approach often demands fine-tuning step sizes based on the level of data similarity. When data similarity is low, these small step sizes result in an unacceptably slow convergence speed for federated methods. In this paper, we present a novel and unified framework for analyzing the convergence of federated learning algorithms without the need for data similarity conditions. Our analysis centers on an inequality that captures the influence of step sizes on algorithmic convergence performance. By applying our theorems to wellknown federated algorithms, we derive precise expressions for three widely used step size schedules: fixed, diminishing, and step-decay step sizes, which are independent of data similarity conditions. Finally, we conduct comprehensive evaluations of the performance of these federated learning algorithms, employing the proposed step size strategies to train deep neural network models on benchmark datasets under varying data similarity conditions. Our findings demonstrate significant improvements in convergence speed and overall performance, marking a substantial advancement in federated learning research.

show abstract

Section: Prior Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Comparing NARS and Reinforcement Learning: An Analysis of ONA and Q-Learning Algorithms

Beikmohammadi

Magnússon

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…For example, VRL-SGD and SCAFFOLD do not consider HLU to address the system heterogeneity, and FedProx and FedNova still suffer from convergence slowdown caused by non-i.i.d. data. Interestingly, recent findings show that primal-dual FL methods based on the alternating direction method of multipliers (ADMM) (Boyd et al 2010) (Hajinezhad et al 2016) are inherently resilient to both data and system heterogeneity, see, e.g., FedPD (Zhang et al 2021), FedADMM (Gong, Li, and Freris 2022) and FedDyn (Acar et al 2021). However, their convergences rely on the constant and uniform client sampling, and the requirement of the clients to either solve the local subproblems globally or to a sufficient accuracy.…”

Section: Introductionmentioning

confidence: 99%

Beyond ADMM: A Unified Client-Variance-Reduced Adaptive Federated Learning Framework

Wang

Wang³

et al. 2023

AAAI

View full text Add to dashboard Cite

As a novel distributed learning paradigm, federated learning (FL) faces serious challenges in dealing with massive clients with heterogeneous data distribution and computation and communication resources. Various client-variance-reduction schemes and client sampling strategies have been respectively introduced to improve the robustness of FL. Among others, primal-dual algorithms such as the alternating direction of method multipliers (ADMM) have been found being resilient to data distribution and outperform most of the primal-only FL algorithms. However, the reason behind remains a mystery still. In this paper, we firstly reveal the fact that the federated ADMM is essentially a client-variance-reduced algorithm. While this explains the inherent robustness of federated ADMM, the vanilla version of it lacks the ability to be adaptive to the degree of client heterogeneity. Besides, the global model at the server under client sampling is biased which slows down the practical convergence. To go beyond ADMM, we propose a novel primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model. In addition, FedVRA unifies several representative FL algorithms in the sense that they are either special instances of FedVRA or are close to it. Extensions of FedVRA to semi/un-supervised learning are also presented. Experiments based on (semi-)supervised image classification tasks demonstrate superiority of FedVRA over the existing schemes in learning scenarios with massive heterogeneous clients and client sampling.

show abstract

“…The Alternating Direction Method of Multipliers (ADMM) is an iterative algorithm that transforms optimization problems into an augmented Lagrangian function and updates primal and dual variables alternately to reach the optimal solution [13]. ADMM has been shown to achieve higher solution accuracy in various disciplines, such as matrix completion and separation [79], [100], compressive sensing [16], [103], and machine learning [27], [62], [108], [114], [117]. Moreover, as a primaldual scheme, ADMM is more stable.…”

Section: Introductionmentioning

confidence: 99%

Learning to Optimize DAG Scheduling in Heterogeneous Environment

Zhou

Luo

et al. 2022

2022 23rd IEEE International Conference on Mobile Data Management (MDM)

View full text Add to dashboard Cite

Statistical heterogeneity is a root cause of tension among accuracy, fairness, and robustness of federated learning (FL), and is key in paving a path forward. Personalized federated learning (PFL) is an approach that aims to reduce the impact of statistical heterogeneity by developing personalized models for individual users, while also inherently providing benefits in terms of fairness and robustness. However, existing PFL frameworks focus on improving the performance of personalized models while neglecting the global model. This results in PFL suffering from lower solution accuracy when clients have different kinds of heterogeneous data. Moreover, these frameworks typically achieve sublinear convergence rates and rely on strong assumptions. In this paper, we employ the Moreau envelope as a regularized loss function and propose FLAME, an optimization framework by utilizing the alternating direction method of multipliers (ADMM) to train personalized and global models. Due to the gradient-free nature of ADMM, FLAME alleviates the need for tuning the learning rate during training of the global model. We demonstrate that FLAME can generalize to the existing PFL and FL frameworks. Moreover, we propose a model selection strategy to improve performance in situations where clients have different types of heterogeneous data. Our theoretical analysis establishes the global convergence and two kinds of convergence rates for FLAME under mild assumptions. Specifically, under the assumption of gradient Lipschitz continuity, we obtain a sublinear convergence rate. Further assuming the loss function is lower semicontinuous, coercive, and either real analytic or semialgebraic, we can obtain constant, linear, and sublinear convergence rates under different conditions. We also theoretically demonstrate that FLAME is more robust and fair than the state-of-the-art methods on a class of linear problems. We thoroughly conduct experiments by utilizing six schemes to partition non-i.i.d. data, confirming the performance comparison among state-of-the-art methods. Our experimental findings show that FLAME outperforms state-ofthe-art methods in convergence and accuracy, and it achieves higher test accuracy under various attacks and performs more uniformly across clients in terms of robustness and fairness.

show abstract

FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity

Cited by 17 publications

References 17 publications

Comparing NARS and Reinforcement Learning: An Analysis of ONA and Q-Learning Algorithms

Comparing NARS and Reinforcement Learning: An Analysis of ONA and Q-Learning Algorithms

Beyond ADMM: A Unified Client-Variance-Reduced Adaptive Federated Learning Framework

Learning to Optimize DAG Scheduling in Heterogeneous Environment

Contact Info

Product

Resources

About