Federated learning (pioneered by Google) is a new class of machine learning models trained on distributed datasets, and equally important a key privacy-preserving data technology.With huge amounts of data for analysis, organisations are faced with three major challenges: a) data comprises distributed and isolated data sets; b) analytics requires models to be trained across these independent data sets; and c) data sovereignty/privacy legislation is making collecting, sharing and analysing data increasingly difficult. This paper reviews federated learning both in terms of a) a federated data infrastructure for privacy-preserving data access; and b) federated machine learning applied to distributed data sets. Given the pivotal role of federated learning, the contribution of this paper is to place it in perspective to the other data science technologies. It includes discussions of the privacy challenges facing data analytics, relationship to the major data infrastructure technologies, and the emerging machine learning algorithms impacting federated learning.
Federated learning is a pioneering privacy-preserving data technology and also a new machine learning model trained on distributed data sets. Companies collect huge amounts of historic and real-time data to drive their business and collaborate with other organizations. However, data privacy is becoming increasingly important because of regulations (e.g., EU GDPR) and the need to protect their sensitive and personal data. Companies need to manage data access: firstly within their organizations (so they can control staff access), and secondly protecting raw data when collaborating with third parties. What is more, companies are increasingly looking to 'monetize' the data they've collected. However, under new legislations, utilizing data by different organization is becoming increasingly difficult (Yu, 2016). Federated learning pioneered by Google is the emerging privacy-preserving data technology and also a new class of distributed machine learning models. This paper discusses federated learning as a solution for privacy-preserving data access and distributed machine learning applied to distributed data sets. It also presents a privacy-preserving federated learning infrastructure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.