Secure Distributed Training at Scale

Gorbunov, Eduard; Alexander, Borzunov,; Diskin, Michael; Ryabinin, Max

doi:10.48550/arxiv.2106.11257

Cited by 1 publication

(3 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As. 2.4 Gorbunov et al [2021a] assume additionally that the tails of the noise distribution in stochastic gradients are sub-quadratic.…”

Section: Br-mvrmentioning

confidence: 99%

“…This approach is extended to the case of heterogeneous data and aggregators agnostic to the noise level by , and propose an extension to the decentralized optimization over fixed networks. Gorbunov et al [2021a] propose an alternative approach based on the usage of AllReduce [Patarasuk and Yuan, 2009] with additional verifications of correctness and show that their algorithm has complexity not worse than Parallel-SGD when the target accuracy is small enough. Wu et al [2020] are the first who applied variance reduction mechanism to tolerate Byzantine attacks (see the discussion above Q1).…”

Section: A Detailed Related Workmentioning

confidence: 99%

“…Wu et al [2020] are the first who applied variance reduction mechanism to tolerate Byzantine attacks (see the discussion above Q1). We also refer reader to , Rajput et al, 2019, Rodríguez-Barroso et al, 2020, Xu and Lyu, 2020, Alistarh et al, 2018, Allen-Zhu et al, 2021, Regatti et al, 2020, Yang and Bajwa, 2019a,b, Gupta et al, 2021, Peng et al, 2021 for other advances in Byzantine-robustness (see the detailed summaries in , Gorbunov et al, 2021a). We further progress the field by obtaining new theoretical SOTA convergence results in our work.…”

Section: A Detailed Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top

Gorbunov¹,

Horváth²,

Richtárik³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in collaborative and federated learning. However, many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field. This work addresses this gap and proposes Byz-VR-MARINA-a new Byzantine-tolerant method with variance reduction and compression. A key message of our paper is that variance reduction is key to fighting Byzantine workers more effectively. At the same time, communication compression is a bonus that makes the process more communication efficient. We derive theoretical convergence guarantees for Byz-VR-MARINA outperforming previous state-of-the-art for general non-convex and Polyak-Łojasiewicz loss functions. Unlike the concurrent Byzantine-robust methods with variance reduction and/or compression, our complexity results are tight and do not rely on restrictive assumptions such as boundedness of the gradients or limited compression. Moreover, we provide the first analysis of a Byzantine-tolerant method supporting non-uniform sampling of stochastic gradients. Numerical experiments corroborate our theoretical findings.

show abstract

“…As. 2.4 Gorbunov et al [2021a] assume additionally that the tails of the noise distribution in stochastic gradients are sub-quadratic.…”

Section: Br-mvrmentioning

confidence: 99%