Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
We introduce a new class of exact MinimumBandwidth Regenerating (MBR) codes for distributed storage systems, characterized by a low-complexity uncoded repair process that can tolerate multiple node failures. These codes consist of the concatenation of two components: an outer MDS code followed by an inner repetition code. We refer to the inner code as a Fractional Repetition code since it consists of splitting the data of each node into several packets and storing multiple replicas of each on different nodes in the system.Our model for repair is table-based, and thus, differs from the random access model adopted in the literature. We present constructions of Fractional Repetition codes based on regular graphs and Steiner systems for a large set of system parameters. The resulting codes are guaranteed to achieve the storage capacity for random access repair. The considered model motivates a new definition of capacity for distributed storage systems, that we call Fractional Repetition capacity. We provide upper bounds on this capacity, while a precise expression remains an open problem.
The index coding problem has recently attracted a significant attention from the research community due to its theoretical significance and applications in wireless ad-hoc networks. An instance of the index coding problem includes a sender that holds a set of information messages X = {x1, . . . , x k } and a set of receivers R. Each receiver ρ = (x, H) ∈ R needs to obtain a message x ∈ X and has prior side information comprising a subset H of X. The sender uses a noiseless communication channel to broadcast encoding of messages in X to all clients. The objective is to find an encoding scheme that minimizes the number of transmissions required to satisfy the receivers' demands with zero error.In this paper, we analyze the relation between the index coding problem, the more general network coding problem and the problem of finding a linear representation of a matroid. In particular, we show that any instance of the network coding and matroid representation problems can be efficiently reduced to an instance of the index coding problem. Our reduction implies that many important properties of the network coding and matroid representation problems carry over to the index coding problem. Specifically, we show that vector linear codes outperform scalar linear codes and that vector linear codes are insufficient for achieving the optimum number of transmissions.
Abstract-We address the problem of securing distributed storage systems against eavesdropping and adversarial attacks. An important aspect of these systems is node failures over time, necessitating, thus, a repair mechanism in order to maintain a desired high system reliability. In such dynamic settings, an important security problem is to safeguard the system from an intruder who may come at different time instances during the lifetime of the storage system to observe and possibly alter the data stored on some nodes. In this scenario, we give upper bounds on the maximum amount of information that can be stored safely on the system. For an important operating regime of the distributed storage system, which we call the bandwidthlimited regime, we show that our upper bounds are tight and provide explicit code constructions. Moreover, we provide a way to short list the malicious nodes and expurgate the system.
The problem of providing privacy, in the private information retrieval (PIR) sense, to users requesting data from a distributed storage system (DSS), is considered. The DSS is coded by an (n, k, d) Maximum Distance Separable (MDS) code to store the data reliably on unreliable storage nodes. Some of these nodes can be spies which report to a third party, such as an oppressive regime, which data is being requested by the user. An information theoretic PIR scheme ensures that a user can satisfy its request while revealing no information on which data is being requested to the nodes. A user can trivially achieve PIR by downloading all the data in the DSS. However, this is not a feasible solution due to its high communication cost. We construct PIR schemes with low download communication cost. When there is b = 1 spy node in the DSS, in other words, no collusion between the nodes, we construct PIR schemes with download cost 1 1−R per unit of requested data (R = k/n is the code rate), achieving the information theoretic limit for linear schemes. The proposed schemes are universal since they depend on the code rate, but not on the generator matrix of the code. Also, if b ≤ n − δk nodes collude, with δ = n−b k , we construct linear PIR schemes with download cost b+δk δ .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.