2002
DOI: 10.1145/568522.568525
|View full text |Cite
|
Sign up to set email alerts
|

A survey of rollback-recovery protocols in message-passing systems

Abstract: This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into checkpoint-based and log-based. Checkpoint-based protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated, uncoordinated, or communication-induced. Log-based protocols combine checkpointing with logging of nondeterministic events, encoded in tuples called determinants. Depending on how determina… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
994
0
22

Year Published

2004
2004
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 1,392 publications
(1,079 citation statements)
references
References 39 publications
1
994
0
22
Order By: Relevance
“…Optimistic message logging is very attractive for providing fault-tolerance with low failure-free overhead for large-scale distributed systems [3]. However, it may suffer from cascading rollback due to its message log volatility.…”
Section: Introductionmentioning
confidence: 99%
“…Optimistic message logging is very attractive for providing fault-tolerance with low failure-free overhead for large-scale distributed systems [3]. However, it may suffer from cascading rollback due to its message log volatility.…”
Section: Introductionmentioning
confidence: 99%
“…During recovery, log-based rollback-recovery protocols force the execution of the system to be identical to the one that occurred before the failure, up to the maximum recoverable state. Therefore, the system always recovers to a state that is consistent with the input and output interactions that occurred up to the maximum recoverable state [45]. …”
Section: Log-based Rollback Recoverymentioning
confidence: 75%
“…Lost messages may occur when in-transit messages between two processes are not captured by a checkpointing mechanism. Therefore when these two checkpoint files are restored for the application to continue, p2 will never receive the message m1 (unless retransmitted) and this can lead to a failure [45].…”
Section: Terminologymentioning
confidence: 99%
See 2 more Smart Citations