2008
DOI: 10.1016/j.future.2007.02.002
|View full text |Cite
|
Sign up to set email alerts
|

Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 57 publications
(33 citation statements)
references
References 6 publications
0
33
0
Order By: Relevance
“…One promising alternative is to use local storage (memory, SSD, local disks) [1], [2], [4]. During checkpoint, the application usually stops the execution until the checkpoint is safely stored, using what is called the blocking algorithm [5].…”
Section: Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…One promising alternative is to use local storage (memory, SSD, local disks) [1], [2], [4]. During checkpoint, the application usually stops the execution until the checkpoint is safely stored, using what is called the blocking algorithm [5].…”
Section: Motivationmentioning
confidence: 99%
“…This non-blocking algorithm is totally asynchronous and runs in conjunction with the application. However, since it needs to store the in-flight messages as part of the checkpoint, it has a higher memory footprint and a non-trivial implementation [5].…”
Section: Introductionmentioning
confidence: 99%
“…Several algorithms have been proposed to coordinate checkpoints, the most usual being the ChandyLamport algorithm [6] and the blocking coordinated checkpointing, [5,17], which silences the network. In these algorithms, waves of tokens are exchanged to form a recovery line that eliminates orphan messages and detects in-transit messages.…”
Section: Building a Consistent Recovery Setmentioning
confidence: 99%
“…Among rollback-recovery techniques [7], sender-based message logging [1,8,20] with check pointing [2,3,6,11,14] is one of the most lightweight fault-tolerance techniques to be capable of being applied in those fields. It may considerably lower high failure-free overhead of receiver-based message logging [15,21] resulting from synchronously logging each message into stable storage, which can be realized by using volatile memory of its sender as storage for logging [1,7,8,10,20].…”
Section: Introductionmentioning
confidence: 99%