2008 37th International Conference on Parallel Processing 2008
DOI: 10.1109/icpp.2008.22
|View full text |Cite
|
Sign up to set email alerts
|

Realistic Models and Efficient Algorithms for Fault Tolerant Scheduling on Heterogeneous Platforms

Abstract: Most list scheduling heuristics rely on a simple platform model where communication contention is not taken into account. In addition, it is generally assumed that processors in the systems are completely safe. To schedule precedence graphs in a more realistic framework, we introduce an efficient fault tolerant scheduling algorithm that is both contentionaware and capable of supporting ε arbitrary fail-silent (fail-stop) processor failures. We focus on a bi-criteria approach, where we aim at minimizing the tot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0
1

Year Published

2009
2009
2016
2016

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(14 citation statements)
references
References 26 publications
(64 reference statements)
0
13
0
1
Order By: Relevance
“…In other words, a given processor can simultaneously send a message, receive another message, and perform some computation. The bi-directional one-port model seems closer to the actual capabilities of modern networks (see the discussion of related work in [3,4]). Indeed, it seems to fit the performance of some current MPI implementations, which serialize asynchronous MPI sends as soon as message sizes exceed a few megabytes [3].…”
Section: Introductionmentioning
confidence: 94%
See 1 more Smart Citation
“…In other words, a given processor can simultaneously send a message, receive another message, and perform some computation. The bi-directional one-port model seems closer to the actual capabilities of modern networks (see the discussion of related work in [3,4]). Indeed, it seems to fit the performance of some current MPI implementations, which serialize asynchronous MPI sends as soon as message sizes exceed a few megabytes [3].…”
Section: Introductionmentioning
confidence: 94%
“…In this paper, we introduce the Iso-Level Contention-Aware Fault Tolerant (Iso-Level CAFT) scheduling algorithm (a new version of CAFT [3] that were initially designed to address both problems of network contention and fault-tolerance scheduling) that aims at tolerating multiple processor failures without sacrificing the latency. Iso-Level CAFT is based on an active replication scheme to mask failures, so that there is no need for detecting and handling such failures.…”
Section: Introductionmentioning
confidence: 99%
“…(4) Note that if the task and its predecessor are scheduled on the same processor, there is no need to wait for other copies from task predecessor replicas to send their data to achieve fault tolerant operation since if the processor hasn't failed then task T will receive its data from the predecessor T` on the same processor, otherwise the processor is faulty and there is no need to send data to it [17].…”
Section: B Schedule Length Calculationmentioning
confidence: 99%
“…CAFT (Contention-Aware Fault Tolerant) scheduling algorithm has been introduced in [17]. CAFT is another scheduling heuristic where tasks is replicated N times on N different processors to tolerate N-1 permanent processor failures however the number of communications induced by the replication scheme is drastically reduced.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation