2019
DOI: 10.1016/j.future.2018.09.041
|View full text |Cite
|
Sign up to set email alerts
|

Local rollback for resilient MPI applications with application-level checkpointing and message logging

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
3

Relationship

1
9

Authors

Journals

citations
Cited by 23 publications
(23 citation statements)
references
References 14 publications
0
23
0
Order By: Relevance
“…A local rollback protocol that can be generally applied to single program, multiple data (SPMD) applications is proposed in [35]. It combines the ComPiler for Portable Checkpointing (CPPC) tool, message logging, and ULFM.…”
Section: Non-shrinking Solutionsmentioning
confidence: 99%
“…A local rollback protocol that can be generally applied to single program, multiple data (SPMD) applications is proposed in [35]. It combines the ComPiler for Portable Checkpointing (CPPC) tool, message logging, and ULFM.…”
Section: Non-shrinking Solutionsmentioning
confidence: 99%
“…Buddy in-memory checkpointing can be a limiting aspect as it does not cover multiple-node failure scenarios, but it still provides efficient resilience as these scenarios are unlikely. This approach also makes it possible to adopt a local restart strategy such as [13], with computation kept running on the surviving nodes. This is made possible thanks to the ULFM MPI proposal implementation [5] and by using message logging [3], [7].…”
Section: Context and Related Workmentioning
confidence: 99%
“…Implementations of causal message logging protocols in MPI libraries include MPICH-V [7], MPICH-V2 [10], and Charm++ [19]. The work of Losada et al [18] implements the most recent variation on pessimistic message logging protocols; it employs the VProtocol in Open MPI 1 , and a source-to-source compiler called CPPC. Our work, in contrast to pessimistic, optimistic, and causal message logging protocols, entirely disables event logging; we statically analyse the underlying applications, and guarantee the absence of non-deterministic events that need to be replayed.…”
Section: Related Workmentioning
confidence: 99%