We report on Krum, the rst provably Byzantine-tolerant aggregation rule for distributed Stochastic Gradient Descent (SGD). Krum guarantees the convergence of SGD even in a distributed setting where (asymptotically) up to half of the workers can be malicious adversaries trying to attack the learning system.
International audienceWe present the first (practically) self-stabilizing replicated state machine for asynchronous message passing systems. The scheme ensures that starting from an arbitrary configurations, the replicated state-machine eventually exhibits the desired behaviour for a long enough execution regarding all practical considerations
Abstract. This paper considers the fundamental problem of self-stabilizing leader election (SSLE) in the model of population protocols. In this model, an unknown number of asynchronous, anonymous and nite state mobile agents interact in pairs over a given communication graph. SSLE has been shown to be impossible in the original model. This impossibility can been circumvented by a modular technique augmenting the system with an oracle -an external module abstracting the added assumption about the system. Fischer and Jiang have proposed solutions to SSLE, for complete communication graphs and rings, using an oracle Ω?, called the eventual leader detector. In this work, we present a solution for arbitrary graphs, using a composition of two copies of Ω?. We also prove that the di culty comes from the requirement of self-stabilization, by giving a solution without oracle for arbitrary graphs, when an uniform initialization is allowed. Finally, we prove that there is no self-stabilizing implementation of Ω? using SSLE, in a sense we de ne precisely.
The growth of data, the need for scalability and the complexity of models used in modern machine learning calls for distributed implementations. Yet, as of today, distributed machine learning frameworks have largely ignored the possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study the robustness to Byzantine failures at the fundamental level of stochastic gradient descent (SGD), the heart of most machine learning algorithms. Assuming a set of n workers, up to f of them being Byzantine, we ask how robust can SGD be, without limiting the dimension, nor the size of the parameter space.We first show that no gradient descent update rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the update rule capturing the basic requirements to guarantee convergence despite f Byzantine workers. We finally propose Krum, an update rule that satisfies the resilience property aforementioned. For a d-dimensional learning problem, the time complexity of Krum is O(n 2 • (d + log n)).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.