Recently proposed probabilistic checkpointing has one drawback, namely aliasing. When analyzed, 64-bit signatures show negligible possibility of aliasing. But in practice, the shift-XOR signature generation function used with probabilistic checkpointing shows a high aliasing rate, which limits the practicality of probabilistic checkpointing. In this paper, two enhancements are considered to make probabilistic checkpointing more reliable. One is the signature generation function and the other is the recovery scheme. In the signature generation function part, we propose two signature generation functions: HALF for small block sizes (less than or equal to 256 bytes) and C-HALF(CRC combined HALF) for large block sizes (larger than 256 bytes), which have an aliasing probability similar to analytic results and small overhead. In the recovery scheme part, we propose a recovery scheme which ensures the safety of probabilistic checkpointing. To examine the correctness of previous checkpoints at recovery time, the proposed recovery scheme uses a spare node. We analyze the recovery scheme using a mathematical model. Also an optimal checkpoint interval is derived using the model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.