2021
DOI: 10.48550/arxiv.2102.06280
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers

Abstract: With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each worker maintains a local estimate of the optimal parameter vector, and iteratively updates it by waiting and averaging all estimates obtained from its neighbors, and then corrects it on the basis of its local dataset. However, the synchronization phase can be time consuming … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 31 publications
(37 reference statements)
0
3
0
Order By: Relevance
“…Since the term of federated learning was introduced in the seminal work , there is an explosive growth in federated learning research. For example, a line of works focuses on designing algorithms to achieve higher learning accuracy and analyze their convergence performance, e.g., (Smith et al 2017;Li et al 2020b;Liu et al 2020;Wang et al 2020b;Xiong, Yan, and Li 2021). Another line of works aim to improve the communication efficiency between the central server and clients through compressions or sparsification, (Konečnỳ et al 2016;Suresh et al 2017;Xu et al 2019), communication frequency optimization (Wang and Joshi 2019;Karimireddy et al 2020), client selections (Lai et al 2021;Wang et al 2020a), etc.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Since the term of federated learning was introduced in the seminal work , there is an explosive growth in federated learning research. For example, a line of works focuses on designing algorithms to achieve higher learning accuracy and analyze their convergence performance, e.g., (Smith et al 2017;Li et al 2020b;Liu et al 2020;Wang et al 2020b;Xiong, Yan, and Li 2021). Another line of works aim to improve the communication efficiency between the central server and clients through compressions or sparsification, (Konečnỳ et al 2016;Suresh et al 2017;Xu et al 2019), communication frequency optimization (Wang and Joshi 2019;Karimireddy et al 2020), client selections (Lai et al 2021;Wang et al 2020a), etc.…”
Section: Related Workmentioning
confidence: 99%
“…Unlike traditional centralized machine learning, data samples of each client in FL follow a non-identical and independent distribution (non-IID), introducing bias that slows down or even fails the training. A few recent studies have been proposed to address these challenges by model compression (Konečnỳ et al 2016;Suresh et al 2017), communication frequency optimization (Wang and Joshi 2019;Karimireddy et al 2020), and client selections (Lai et al 2021;Wang et al 2020a;Xiong, Yan, and Li 2021).…”
Section: Introductionmentioning
confidence: 99%
“…The usage of many trained models has been addressed by [13] in a distributed context. The proposed architectural approach involves setting up a dynamic set of backup workers and a custom-designed consensus algorithm showing linear speedup for experiments' convergence.…”
Section: Related Workmentioning
confidence: 99%