Federated Learning (FL) is an emerging learning paradigm that enables collaborative model training, across multiple devices using decentralized data, allowing each device to keep the privacy of its local data. Heterogeneity of data distributions is an inherent characteristic of FL. Generally, data samples across user devices are Not-Independent and Identically Distributed (N-IID), making learning in federated settings a challenging task. In this paper, we aim to contribute to FL benchmarking by introducing PyFed, an open source and scalable simulation framework of federated settings, supporting N-IID data. PyFed is fully compatible with PySyft, the secure and private framework for deep learning. It includes a set of benchmark datasets and implements different types of N-IID data distributions. PyFed also provides a set of implementations that can be used as reference for FL development.
Federated learning (FL) has been proposed as a machine learning approach to collaboratively learn a shared prediction model. Although, during FL training, only a subset of workers participate in each round, existing approaches introduce model bias when considering the average of local model parameters of heterogeneous workers, which degrades the accuracy of the learned global model. In this paper, we introduce NIFL, a new strategy for worker selection that handles the statistical challenges of FL when local data is Non-Independent and Identically Distributed (N-IID). In NIFL, the server starts sending the signal to the workers that react by sending the number of their samples. The server then selects a percentage of workers with the highest number of samples and requests data statistics such as mean and standard deviation. After that, the server calculates our proposed N-IID index, based on the statistical information collected from the workers without having access to their data, and uses this index as a criterion for worker selection. Finally, the server broadcasts the global model to the selected workers. NIFL takes into account the disparity in the distribution of workers' data in order to improve the performance of the model in heterogeneous data environment. We have performed several experiments with N-IID data. The obtained results show that both the convergence of our method and the test accuracy increased considerably comparing to the other techniques while keeping a reasonable computation and communication costs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.