2016 9th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) 2016
DOI: 10.1109/mtags.2016.04
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Diagnose Stragglers in Distributed Computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…When a task in execution becomes slower than other tasks in the same job, this task is called a "straggler task" which prolong the entire job and the cluster throughput will be affected. There are numerous causes that make a task take a long time in execution and turned into a straggler [29], [30]. These causes like hardware heterogeneity, overawed machines, network congestion, bad code and contention of resources between tasks running on the straggler machine.…”
Section: B Straggler Problemmentioning
confidence: 99%
“…When a task in execution becomes slower than other tasks in the same job, this task is called a "straggler task" which prolong the entire job and the cluster throughput will be affected. There are numerous causes that make a task take a long time in execution and turned into a straggler [29], [30]. These causes like hardware heterogeneity, overawed machines, network congestion, bad code and contention of resources between tasks running on the straggler machine.…”
Section: B Straggler Problemmentioning
confidence: 99%
“…The successful execution of a task is important to avoid straggler occurrence during execution of jobs [24][25][26][27][28] [67] [68]. During job execution, stragglers can occur due to unhandled requests or ineffective task interference and task incompatibility management.…”
Section: Task Executionmentioning
confidence: 99%
“…The proposed techniques improve the performance of computing systems by reducing task stragglers occurrence. Cong et al [27] proposed a Machine Learning based Straggler Detection (MLSD) technique using unsupervised clustering method. The proposed technique effectively manages the resources while executing the jobs and diagnosing the stragglers at runtime.…”
Section: Straggler Detection Straggler Mitigationmentioning
confidence: 99%
“…This allows reconstruction of the full computation result with a decoding step even if not all the workers have returned computation results. Therefore, CC schemes can reduce latency and increase reliability in scenarios in which the worker nodes are, e.g., subject to random failures or straggling [1].…”
Section: Introductionmentioning
confidence: 99%