Federated Learning on Non-IID Data: A Survey

Zhu, Hangyu; Xu, Jinjin; Liu, Shiqing; Jin, Yaochu

doi:10.48550/arxiv.2106.06843

Cited by 15 publications

(20 citation statements)

References 139 publications

(154 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…6, it is clear to see that the test performances on these three datasets are relatively insensitive to different numbers of clients. These results make sense since non-parametric models trained in the VFL setting have nearly the same performance as those trained in the standard centralized learning [44]. By contrast, however, the average test accuracy on the Credit dataset drops from 0.82 to 0.8 when the number of clients is reduced to 2.…”

Section: B Sensitivity Analysismentioning

confidence: 93%

PIVODL: Privacy-preserving vertical federated learning over distributed labels

Zhu¹,

Wang²,

Jin³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Federated learning (FL) is an emerging privacy preserving machine learning protocol that allows multiple devices to collaboratively train a shared global model without revealing their private local data. Non-parametric models like gradient boosting decision trees (GBDT) have been commonly used in FL for vertically partitioned data. However, all these studies assume that all the data labels are stored on only one client, which may be unrealistic for real-world applications. Therefore, in this work, we propose a secure vertical FL framework, named PIVODL, to train GBDT with data labels distributed on multiple devices. Both homomorphic encryption and differential privacy are adopted to prevent label information from being leaked through transmitted gradients and leaf values. Our experimental results show that both information leakage and model performance degradation of the proposed PIVODL are negligible.

show abstract

Section: B Sensitivity Analysismentioning

confidence: 93%

PIVODL: Privacy-preserving vertical federated learning over distributed labels

Zhu¹,

Wang²,

Jin³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…• IID-ness of client datasets. The test accuracy of a model learned using FL is adversely impacted if the datasets on the remote clients are not independent and identically distributed (IID) [80,81]. The fundamental reason for this performance degradation is that when the client datasets are non-IID or heterogeneous, the local models trained on the clients may diverge, despite having the same initial parameters.…”

Section: Key Elements Of a Federated Learning Systemmentioning

confidence: 99%

FLAME: Federated Learning Across Multi-device Environments

Cho¹,

Mathur²,

Kawsar³

2022

Preprint

View full text Add to dashboard Cite

Federated Learning (FL) enables distributed training of machine learning models while keeping personal data on user devices private. While we witness increasing applications of FL in the area of mobile sensing, such as human-activity recognition, FL has not been studied in the context of a multi-device environment (MDE), wherein each user owns multiple data-producing devices. With the proliferation of mobile and wearable devices, MDEs are increasingly becoming popular in ubicomp settings, therefore necessitating the study of FL in them. FL in MDEs is characterized by high non-IID-ness across clients, complicated by the presence of both user and device heterogeneities. Further, ensuring efficient utilization of system resources on FL clients in a MDE remains an important challenge. In this paper, we propose FLAME, a user-centered FL training approach to counter statistical and system heterogeneity in MDEs, and bring consistency in inference performance across devices. FLAME features (i) user-centered FL training utilizing the time alignment across devices from the same user; (ii) accuracy-and efficiency-aware device selection; and (iii) model personalization to devices. We also present an FL evaluation testbed with realistic energy drain and network bandwidth profiles, and a novel class-based data partitioning scheme to extend existing HAR datasets to a federated setup. Our experiment results on three multi-device HAR datasets show that FLAME outperforms various baselines by 4.8-33.8% higher 𝐹 1 score, 1.02-2.86× greater energy efficiency, and up to 2.02× speedup in convergence to target accuracy through fair distribution of the FL workload.CCS Concepts: • Human-centered computing → Ubiquitous and mobile computing systems and tools; • Computing methodologies → Cooperation and coordination.

show abstract

“…Periodically, the updated global model is pushed to each device, allowing individual users to benefit from the experience of the model on other devices. Federated Learning is not a trivial problem as the heterogeneous (non-iid) data generated by different clients causes the local models to diverge, making it difficult to aggregate the local updates into a global update [33]. Despite being a relatively new research field, Federated Learning is already deployed in practice [34].…”

Section: Retraining and Personalizing Modelsmentioning

confidence: 99%

TinyMLOps: Operational Challenges for Widespread Edge AI Adoption

Leroux¹,

Simoens²,

Lootus³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deploying machine learning applications on edge devices can bring clear benefits such as improved reliability, latency and privacy but it also introduces its own set of challenges. Most works focus on the limited computational resources of edge platforms but this is not the only bottleneck standing in the way of widespread adoption. In this paper we list several other challenges that a TinyML practitioner might need to consider when operationalizing an application on edge devices. We focus on tasks such as monitoring and managing the application, common functionality for a MLOps platform, and show how they are complicated by the distributed nature of edge deployment. We also discuss issues that are unique to edge applications such as protecting a model's intellectual property and verifying its integrity.

show abstract

Federated Learning on Non-IID Data: A Survey

Cited by 15 publications

References 139 publications

PIVODL: Privacy-preserving vertical federated learning over distributed labels

PIVODL: Privacy-preserving vertical federated learning over distributed labels

FLAME: Federated Learning Across Multi-device Environments

TinyMLOps: Operational Challenges for Widespread Edge AI Adoption

Contact Info

Product

Resources

About