1996
DOI: 10.1145/233008.233050
|View full text |Cite
|
Sign up to set email alerts
|

Minimizing completion time of a program by checkpointing and rejuvenation

Abstract: Checkpointing with rollback-recovery is a w ell known technique to reduce the completion time of a program in the presence of failures. While checkpointing is corrective in nature, rejuvenation refers to preventive maintenance of software aimed to reduce unexpected failures mostly resulting from the \aging" phenomenon. In this paper, we s h o w h o w both these techniques may be used together to further reduce the expected completion time of a program. The idea of using checkpoints to reduce the amount of roll… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
42
0
2

Year Published

2002
2002
2015
2015

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 60 publications
(44 citation statements)
references
References 12 publications
0
42
0
2
Order By: Relevance
“…Micro-reboot, checkpoint and recovery, and software rejuvenation address non-deterministic failures by rebooting the components that caused the failure (micro-reboot), by rolling back to the latest consistent state and re-executing the failing operations (checkpoint and recovery), and by regularly restarting the system to prevent failures due to software age (rejuvenation) [4,11,12]. Similar to these approaches, Qin et al propose the Rx method that partially re-executes the failing program under modified environment conditions [18].…”
Section: Related Workmentioning
confidence: 99%
“…Micro-reboot, checkpoint and recovery, and software rejuvenation address non-deterministic failures by rebooting the components that caused the failure (micro-reboot), by rolling back to the latest consistent state and re-executing the failing operations (checkpoint and recovery), and by regularly restarting the system to prevent failures due to software age (rejuvenation) [4,11,12]. Similar to these approaches, Qin et al propose the Rx method that partially re-executes the failing program under modified environment conditions [18].…”
Section: Related Workmentioning
confidence: 99%
“…In [20] Chakravorty and Kale explain how pro-active fault tolerance based on processor failure information can lead to migration of tasks from faulty processors to healthy nodes. Software rejuvenation along with checkpointing has been used in [21] to minimize the application execution time pro-actively. The FT policy of proactive combined with reactive policy is relatively less explored.…”
Section: B Related Workmentioning
confidence: 99%
“…In [65] it is proposed and analyzed the combination of software rejuvenation (preventive fault treatment) with checkpointing and recovery to reduce the chances of activating a fault and simultaneously minimizing the loss of computation when there is a failure.…”
Section: Checkpointing and Recoverymentioning
confidence: 99%