2017
DOI: 10.48550/arxiv.1703.02757
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Byzantine-Tolerant Machine Learning

Abstract: The growth of data, the need for scalability and the complexity of models used in modern machine learning calls for distributed implementations. Yet, as of today, distributed machine learning frameworks have largely ignored the possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study the robustness to Byzantine failures at the fundamental level of stochastic gradient descent (SGD), the heart of most machine learning algorithms. Assuming a set of n workers, up to f of them being Byzantine, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
34
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(39 citation statements)
references
References 13 publications
0
34
0
Order By: Relevance
“…We note that with our construction, all the aggregation results {A r } r∈P from the successful rounds are independently and identically distributed (since each data owner performs local computations with M 0 data points, and each round aggregates results from N DOs). Therefore, according to Proposition 2 and Proposition 3 in (Blanchard et al, 2017b), as long as |P | < (1 − 2µ)|P| − 2, where µ is the maximum fraction of the aggregation results that may be corrupted, the estimated overall gradient in (10) provides a close approximation of the true gradient, which leads to the convergence of the model training.…”
Section: Security Of Model Updatementioning
confidence: 97%
See 1 more Smart Citation
“…We note that with our construction, all the aggregation results {A r } r∈P from the successful rounds are independently and identically distributed (since each data owner performs local computations with M 0 data points, and each round aggregates results from N DOs). Therefore, according to Proposition 2 and Proposition 3 in (Blanchard et al, 2017b), as long as |P | < (1 − 2µ)|P| − 2, where µ is the maximum fraction of the aggregation results that may be corrupted, the estimated overall gradient in (10) provides a close approximation of the true gradient, which leads to the convergence of the model training.…”
Section: Security Of Model Updatementioning
confidence: 97%
“…To combat malicious data owners uploading faulty computation results to the contract, we employ the m-Krum algorithm from (Blanchard et al, 2017b) to select the ag-gregation results from a subset of P ⊂ P, which are considered to be close to the expected value with respect to the underlying data distribution.…”
Section: Security Of Model Updatementioning
confidence: 99%
“…One major category of these algorithms is called gradient filters [49], or robust gradient aggregation [7,19], which are designed and used mainly with (distributed) gradient descent (abbr. DGD) [79].…”
Section: Gradient Descent With Gradient Filtersmentioning
confidence: 99%
“…Multi-KRUM Multi-Krum [6,7] is a variant of Krum. Instead of selecting one vector, multi-Krum selects m vectors and averages them, where m is a hyperparameter.…”
mentioning
confidence: 99%
“…For example, a gradient descent machine learning algorithm that handles byzantine failures is presented [3,5]. A practical application is Google's Federated Learning where 𝑚 worker machines analyze 𝑁 𝑚 data samples, where 𝑁 is the total number of samples.…”
Section: Related Workmentioning
confidence: 99%