Proceedings of the 19th International Middleware Conference 2018
DOI: 10.1145/3274808.3274828
|View full text |Cite
|
Sign up to set email alerts
|

Aggressive Synchronization with Partial Processing for Iterative ML Jobs on Clusters

Abstract: Executing distributed machine learning (ML) jobs on Spark follows Bulk Synchronous Parallel (BSP) model, where parallel tasks execute the same iteration at the same time and the generated updates must be synchronized on parameters when all tasks are finished. However, the parallel tasks rarely have the same execution time due to sparse data so that the synchronization has to wait for tasks finished late. Moreover, running Spark on heterogeneous clusters makes it even worse because of stragglers, where the sync… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 26 publications
(24 reference statements)
0
6
0
Order By: Relevance
“…While GeePS supports synchronous, bounded asynchronous and asynchronous parameter synchronization, it is designed to minimize the straggler problem on GPUs, and hence, achieves best convergence speed when using the synchronous approach. Wang et al [184] propose an aggressive synchronization scheme that is based on BSP, named A-BSP. Different from BSP, A-BSP allows the fastest task to fetch current updates generated by the other (straggler) tasks that have only partially processed their input data.…”
Section: Synchronizationmentioning
confidence: 99%
“…While GeePS supports synchronous, bounded asynchronous and asynchronous parameter synchronization, it is designed to minimize the straggler problem on GPUs, and hence, achieves best convergence speed when using the synchronous approach. Wang et al [184] propose an aggressive synchronization scheme that is based on BSP, named A-BSP. Different from BSP, A-BSP allows the fastest task to fetch current updates generated by the other (straggler) tasks that have only partially processed their input data.…”
Section: Synchronizationmentioning
confidence: 99%
“…SSP [12][61] enables processes to execute the training independently and allows fast workers to advance a bounded number of iterations ahead of slow workers. A-BSP [59] is proposed to aggressively synchronize parameters by applying the partial updates from slower workers. But all these approaches target on the centralized PS architecture.…”
Section: Related Workmentioning
confidence: 99%
“…Recent efforts propose alternative synchronization models to mitigate the skewness. A-BSP [23] is a BSP-based aggressive synchronization model that uses updates from partial input data for synchronization. SSP [3], [24] uses flexible synchronization and allows any worker to be up to a bounded number of iterations ahead of the slowest worker.…”
Section: Related Workmentioning
confidence: 99%