Proceedings of 16th International Conference on Distributed Computing Systems
DOI: 10.1109/icdcs.1996.507906
|View full text |Cite
|
Sign up to set email alerts
|

A low-overhead recovery technique using quasi-synchronous checkpointing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
65
0
2

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 85 publications
(70 citation statements)
references
References 16 publications
1
65
0
2
Order By: Relevance
“…In practice, in order to achieve high availability, self-repairing and selfhealing mechanisms are widely adopted in fault-tolerant systems to achieve automatic recovery after the crash occurs. Particularly in middleware systems, there are many techniques and algorithms are proposed to achieve the self-repairing or self-healing goal, such as the connector-based self-healing system described in [32,77] or the reflection technique adopted in [12] or the snapshot algorithms in [61,65]. As we can see that the crash-recovery failure is quite common in many fault-tolerant systems.…”
Section: Motivationmentioning
confidence: 99%
“…In practice, in order to achieve high availability, self-repairing and selfhealing mechanisms are widely adopted in fault-tolerant systems to achieve automatic recovery after the crash occurs. Particularly in middleware systems, there are many techniques and algorithms are proposed to achieve the self-repairing or self-healing goal, such as the connector-based self-healing system described in [32,77] or the reflection technique adopted in [12] or the snapshot algorithms in [61,65]. As we can see that the crash-recovery failure is quite common in many fault-tolerant systems.…”
Section: Motivationmentioning
confidence: 99%
“…However, it will increase the recovery time as greater rollback will be required. Although Some algorithms were proposed to reduce the number of checkpoints to be saved on stable storage, yet, to ensure correctness, a process still needs to keep many more checkpoints in uncoordinated checkpointing algorithms [55], [58], [59], [97]. Generally speaking, uncoordinated checkpointing approaches suffer from the complexities of finding a consistent recovery line after the failure, domino-effect, high stable storage overhead of saving multiple checkpoints of each process, and the overhead of garbage collection.…”
Section: Related Workmentioning
confidence: 99%
“…Most nonblocking algorithms [13], [24], [30] use a Checkpoint Sequence Number (sn) to avoid inconsistencies. More specifically, a process is forced to take a checkpoint if it receives a computation message whose sn is greater than its local sn.…”
Section: The Basic Idea Behind Nonblocking Algorithmsmentioning
confidence: 99%
“…More information on how to deal with process failures can be found in [20], [24], [28]. Since failure detection and failure recovery are orthogonal to our discussion, we will not discuss it further.…”
Section: Handling Failures During Checkpointingmentioning
confidence: 99%
See 1 more Smart Citation