Checkpointing in a time warp synchronized parallel simulator is a necessary and potentially expensive operation. In the simple case, a time warp simulator checkpoints every χ events, for some fixed value χ. For larger values of χ, the simulator requires less overhead for saving the state, but incurs an increased latency during rollback. Thus, the problem is to balance the time to save states against the time to coast forward upon rollback. Unfortunately, a static determination of a optimal value for χ is very difficult and can vary widely, even between closely related instances of a time warp simulator. Furthermore, the optimal checkpoint interval may actually vary over the lifetime of the simulation.
To address these problems, several investigators have proposed dynamically adjusting the checkpoint interval χ as the simulation progresses. This paper analyzes three previous techniques for dynamically sizing checkpoint intervals and presents a new, heuristic algorithm for this purpose. All four techniques are implemented in a common application domain (digital system simulation from VHDL descriptions) and a direct comparison between the algorithms is performed. The results show a significant difference in the performance of the implemented algorithms. However, in virtually all cases, the dynamic algorithms performed near or better than the best static value. Furthermore, the best algorithms performed as much as 12% better than the best static value.
The successful application of optimistic synchronization techniques in parallel simulation requires that rollback overheads be contained. The chief contributions to rollback overhead in a Time Warp simulation are the time required to save state information and the time required to restore a previous state. Two competing techniques for reducing rollback overhead are periodic checkpointing (Lin and Lazowska, 1989) and incremental state saving (Bauer et al., 1991). This paper analytically compares the relative performance of periodic checkpointing to incremental state savings. The analytical model derived for periodic checkpointing is based almost entirely on the previous model developed by Lin (Lin and Lazowska, 1989). The analytical model for incremental state saving has been developed for this study. The comparison assumes an optimal checkpoint interval and shows under what simulation parameters each technique performs best.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.