Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers

Xiong, Guojun; Yan, Gang; Singh, Rahul; Li, Jian

doi:10.48550/arxiv.2102.06280

Cited by 2 publications

(3 citation statements)

References 31 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the term of federated learning was introduced in the seminal work , there is an explosive growth in federated learning research. For example, a line of works focuses on designing algorithms to achieve higher learning accuracy and analyze their convergence performance, e.g., (Smith et al 2017;Li et al 2020b;Liu et al 2020;Wang et al 2020b;Xiong, Yan, and Li 2021). Another line of works aim to improve the communication efficiency between the central server and clients through compressions or sparsification, (Konečnỳ et al 2016;Suresh et al 2017;Xu et al 2019), communication frequency optimization (Wang and Joshi 2019;Karimireddy et al 2020), client selections (Lai et al 2021;Wang et al 2020a), etc.…”

Section: Related Workmentioning

confidence: 99%

“…Unlike traditional centralized machine learning, data samples of each client in FL follow a non-identical and independent distribution (non-IID), introducing bias that slows down or even fails the training. A few recent studies have been proposed to address these challenges by model compression (Konečnỳ et al 2016;Suresh et al 2017), communication frequency optimization (Wang and Joshi 2019;Karimireddy et al 2020), and client selections (Lai et al 2021;Wang et al 2020a;Xiong, Yan, and Li 2021).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Seizing Critical Learning Periods in Federated Learning

Yan

Wang

2022

AAAI

Self Cite

View full text Add to dashboard Cite

Federated learning (FL) is a popular technique to train machine learning (ML) models with decentralized data. Extensive works have studied the performance of the global model; however, it is still unclear how the training process affects the final test accuracy. Exacerbating this problem is the fact that FL executions differ significantly from traditional ML with heterogeneous data characteristics across clients, involving more hyperparameters. In this work, we show that the final test accuracy of FL is dramatically affected by the early phase of the training process, i.e., FL exhibits critical learning periods, in which small gradient errors can have irrecoverable impact on the final test accuracy. To further explain this phenomenon, we generalize the trace of the Fisher Information Matrix (FIM) to FL and define a new notation called FedFIM, a quantity reflecting the local curvature of each clients from the beginning of the training in FL. Our findings suggest that the initial learning phase plays a critical role in understanding the FL performance. This is in contrast to many existing works which generally do not connect the final accuracy of FL to the early phase training. Finally, seizing critical learning periods in FL is of independent interest and could be useful for other problems such as the choices of hyperparameters including but not limited to the number of client selected per round, batch size, so as to improve the performance of FL training and testing.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Seizing Critical Learning Periods in Federated Learning

Yan

Wang

2022

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…The usage of many trained models has been addressed by [13] in a distributed context. The proposed architectural approach involves setting up a dynamic set of backup workers and a custom-designed consensus algorithm showing linear speedup for experiments' convergence.…”

Section: Related Workmentioning

confidence: 99%

Building and Using Multiple Stacks of Models for the Classification of Learners and Custom Recommending of Quizzes

2022

View full text Add to dashboard Cite

Recommending quizzes in e-Learning systems always represents a challenging task, as the quality of recommendations may have a high impact on the student’s progress. We propose a data analysis workflow based on building multiple stacks of models that use information from former students’ taken quizzes. The current implementation uses the RandomForest algorithm for building the models on a real-world dataset that has been obtained in a controlled environment. As preprocessing techniques, we have used normalization and discretization such that training data have been used for classification and regression tasks. At run-time, the models are queried for classifying the student and inferring an optimal quiz that is personalized for the student. We have evaluated the accuracy parametrized on the previous number of quizzes and found that a possible optimal timeframe for each class of students should be used and may provide more helpful quizzes.

show abstract

Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers

Cited by 2 publications

References 31 publications

Seizing Critical Learning Periods in Federated Learning

Seizing Critical Learning Periods in Federated Learning

Building and Using Multiple Stacks of Models for the Classification of Learners and Custom Recommending of Quizzes

Contact Info

Product

Resources

About