FedMAX: Mitigating Activation Divergence for Accurate and Communication-Efficient Federated Learning

Chen, Wei; Bhardwaj, Kartikeya; Mărculescu, Radu

doi:10.1007/978-3-030-67661-2_21

Cited by 18 publications

(19 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Results indicate that the attention aggregation outperforms the gradient descent. Chen et al [62] investigate the problem of activation divergence that occurs during the aggregation of models in FL, which can drastically increase their training time and reduce their accuracy. The authors propose a solution for this problem with a method that maximizes the entropy of activation vectors across all FL participants.…”

Section: Aggregation Methodologiesmentioning

confidence: 99%

Federated Learning for Healthcare: Systematic Review and Architecture Proposal

Antunes

Costa

Küderle

et al. 2022

ACM Trans. Intell. Syst. Technol.

196

View full text Add to dashboard Cite

The use of machine learning (ML) with electronic health records (EHR) is growing in popularity as a means to extract knowledge that can improve the decision-making process in healthcare. Such methods require training of high-quality learning models based on diverse and comprehensive datasets, which are hard to obtain due to the sensitive nature of medical data from patients. In this context, federated learning (FL) is a methodology that enables the distributed training of machine learning models with remotely hosted datasets without the need to accumulate data and, therefore, compromise it. FL is a promising solution to improve ML-based systems, better aligning them to regulatory requirements, improving trustworthiness and data sovereignty. However, many open questions must be addressed before the use of FL becomes widespread. This article aims at presenting a systematic literature review on current research about FL in the context of EHR data for healthcare applications. Our analysis highlights the main research topics, proposed solutions, case studies, and respective ML methods. Furthermore, the article discusses a general architecture for FL applied to healthcare data based on the main insights obtained from the literature review. The collected literature corpus indicates that there is extensive research on the privacy and confidentiality aspects of training data and model sharing, which is expected given the sensitive nature of medical data. Studies also explore improvements to the aggregation mechanisms required to generate the learning model from distributed contributions and case studies with different types of medical data.

show abstract

Section: Aggregation Methodologiesmentioning

confidence: 99%

Federated Learning for Healthcare: Systematic Review and Architecture Proposal

Antunes

Costa

Küderle

et al. 2022

ACM Trans. Intell. Syst. Technol.

196

View full text Add to dashboard Cite

show abstract

“…The authors of [45] proposed the FedProx algorithm, which can enable partial amounts of work to be collected from the agents; however, they randomly selected agents for the training round. According to the authors of [46], FedMax outperforms FedProx in terms of communication rounds by applying a strategy of limiting activation-divergence across multiple devices.…”

Section: Literature Reviewmentioning

confidence: 99%

FedResilience: A Federated Learning Application to Improve Resilience of Resource-Constrained Critical Infrastructures

et al. 2021

View full text Add to dashboard Cite

Critical infrastructures (e.g., energy and transportation systems) are essential lifelines for most modern sectors and have utmost significance in our daily lives. However, these important domains can fail to operate due to system failures or natural disasters. Though the major disturbances in such critical infrastructures are rare, the severity of such events calls for the development of effective resilience assessment strategies to mitigate relative losses. Traditional critical infrastructure resilience approaches consider that the available critical infrastructure agents are resource-sufficient and agree to exchange local data with the server and other agents. Such assumptions create two issues: (1) uncertainty in reaching convergence while applying learning strategies on resource-constrained critical infrastructure agents, and (2) a huge risk of privacy leakage. By understanding the pressing need to construct an effective resilience model for resource-constrained critical infrastructure, this paper aims at leveraging a distributed machine learning technique called Federated Learning (FL) to tackle an agent’s resource limitations effectively and at the same time keep the agent’s information private. Particularly, this paper is focused on predicting the probable outage and resource status of critical infrastructure agents without sharing any local data and carrying out the learning process even when most of the agents are incapable of accomplishing a given computational task. To that end, an FL algorithm is designed specifically for a resource-constrained critical infrastructure environment that could facilitate the training of each agent in a distributed fashion, restrict them from sharing their raw data with any other external entities (e.g., server, neighbor agents), choose proficient clients by analyzing their resources, and allow a partial amount of computation tasks to be performed by the resource-constrained agents. We considered a different number of agents with various stragglers and checked the performance of FedAvg and our proposed FedResilience algorithm with prediction tasks for a probable outage, as well as checking the agents’ resource-sharing scope. Our simulation results show that if the majority of the FL agents are stragglers and we drop them from the training process, then the agents learn very slowly and the overall model performance is negatively affected. We also demonstrate that the selection of proficient agents and allowing them to complete only parts of their tasks can significantly improve the knowledge of each agent by eliminating the straggler effects, and the global model convergence is accelerated.

show abstract

“…This approach allows the number of epochs to be tuned based on the non-IIDness of the client data. While it addresses the weight divergence issue with FedAvg, the convergence speed is slower at higher number of epochs when compared to other state-of-the-art algorithms [6,13,14,15]. FedMA offers the best accuracy and convergence speed in comparison to others but comes with significant compute cost on the client devices.…”

Section: Fedavg and Its Challengesmentioning

confidence: 99%

“…In this paper, we focus on the Optimization Algorithm approach to address the non-IID challenge. While there are numerous state-of-the-art algorithms like FedProx [12], FedMA [13], FedMAX [14] etc., these approaches are not productized in a large scale to the best of knowledge of the authors. Hence, we focus on the most widely deployed FedAvg algorithm [1] and investigate improving its ability to handle non-IID data to the same level as state-of-the-art algorithms like FedMA, FedProx, FedMAX, etc.…”

Section: Introductionmentioning

confidence: 99%

Divide-and-Conquer Federated Learning Under Data Heterogeneity

Chandran¹,

Bhat²,

Chakravarthy³

et al. 2021

AI, Machine Learning and Applications

View full text Add to dashboard Cite

Federated Learning allows training of data stored in distributed devices without the need for centralizing training-data, thereby maintaining data-privacy. Addressing the ability to handle data heterogeneity (non-identical and independent distribution or non-IID) is a key enabler for the wider deployment of Federated Learning. In this paper, we propose a novel Divide-andConquer training methodology that enables the use of the popular FedAvg aggregation algorithm by over-coming the acknowledged FedAvg limitations in non-IID environments. We propose a novel use of Cosine-distance based Weight Divergence metric to determine the exact point where a Deep Learning network can be divided into class-agnostic initial layers and class-specific deep layers for performing a Divide and Conquer training. We show that the methodology achieves trained-model accuracy at-par with (and in certain cases exceeding) the numbers achieved by state-of-the-art algorithms like FedProx, FedMA, etc. Also, we show that this methodology leads to compute and/or bandwidth optimizations under certain documented conditions.

show abstract

FedMAX: Mitigating Activation Divergence for Accurate and Communication-Efficient Federated Learning

Cited by 18 publications

References 6 publications

Federated Learning for Healthcare: Systematic Review and Architecture Proposal

Federated Learning for Healthcare: Systematic Review and Architecture Proposal

FedResilience: A Federated Learning Application to Improve Resilience of Resource-Constrained Critical Infrastructures

Divide-and-Conquer Federated Learning Under Data Heterogeneity

Contact Info

Product

Resources

About