Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems

Dong, Xiangyu; Muralimanohar, Naveen; Jouppi, Norm; Kaufmann, Richard; Xie, Yuan

doi:10.1145/1654059.1654117

Cited by 124 publications

(99 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, to avoid excessively delaying LLC misses due to row conflicts while migrating, the PCM DIMMs are equipped with an extra pair of row-buffers per rank, used exclusively for migrations. Operated by the MC, these buffers communicate with the internal prefetching circuitry of the PCM DIMM [11,12], bypassing the original bank's row buffer. Since our migrations occur in sequence, two of these buffers are necessary only when the migration involves two banks of the same rank, and one buffer would suffice otherwise.…”

Section: Rank-based Page Placementmentioning

confidence: 99%

Page placement in hybrid memory systems

Ramos

Gorbatov

Bianchini

2011

Proceedings of the International Conference on Supercomputing

341

251

View full text Add to dashboard Cite

Phase-Change Memory (PCM) technology has received substantial attention recently. Because PCM is byte-addressable and exhibits access times in the nanosecond range, it can be used in main memory designs. In fact, PCM has higher density and lower idle power consumption than DRAM. Unfortunately, PCM is also slower than DRAM and has limited endurance. For these reasons, researchers have proposed memory systems that combine a small amount of DRAM and a large amount of PCM. In this paper, we propose a new hybrid design that features a hardware-driven page placement policy. The policy relies on the memory controller (MC) to monitor access patterns, migrate pages between DRAM and PCM, and translate the memory addresses coming from the cores. Periodically, the operating system updates its page mappings based on the translation information used by the MC. Detailed simulations of 27 workloads show that our system is more robust and exhibits lower energy-delay 2 than state-of-the-art hybrid systems.

show abstract

Section: Rank-based Page Placementmentioning

confidence: 99%

Page placement in hybrid memory systems

Ramos

Gorbatov

Bianchini

2011

Proceedings of the International Conference on Supercomputing

341

251

View full text Add to dashboard Cite

show abstract

“…Recent studies [6], [7] estimate that the annual increase in memory size and network bandwidth is 41% and 26%, respectively. Figure 1 shows the trends in both memory size and network bandwidth for the period between 2008 and 2020.…”

Section: Motivationmentioning

confidence: 99%

“…Our approach differentiates from theirs in that we provide techniques to reduce the interference of checkpoint for distributed memory clusters. Dong et al leverage PCRAM [7] for checkpointing and propose the hybrid local/global checkpointing mechanism. Their approach can be incorporated with the semi-blocking algorithm by relaxing the stall of computation when taking global checkpoint.…”

Section: Related Workmentioning

confidence: 99%

Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm

Meneses

Kalé

2012

2012 IEEE International Conference on Cluster Computing

View full text Add to dashboard Cite

Abstract-The HPC community has seen a steady increase in the number of components in every generation of supercomputers. Assembling a large number of components into a single cluster makes a machine more powerful, but also much more prone to failures. Therefore, fault tolerance has become a major concern in HPC. To deal with node crashes in large systems, checkpoint/restart is by far the preferred method. A typical way to implement checkpoints is by using a blocking algorithm, which suspends the execution of the application while the checkpoint is safely stored. One limitation of the blocking algorithm is that it saturates the network bandwidth at the time of checkpoint. This problem will become even more critical because the projected network bandwidth increase will not match the increase in memory per node. To alleviate this problem, we have developed a semi-blocking checkpoint algorithm that overlaps execution of the application with transmission of checkpoints. Our implementation decomposes a checkpoint into small messages that are interleaved with application messages. The experimental results show a dramatic reduction in the checkpoint overhead for various applications. We present a model for our approach and use this model to compute the benefit of the semi-blocking algorithm for different failure rates predicted at Exascale. We estimate our method can reduce up to 22% the total execution time of an iterative scientific application.

show abstract

“…We then compare the efficiency of Euripus to three systems: 1) one that creates undo-log checkpoints every 10ms but redo logs every every 1 hour (UndoLog+RL1h), 2) another that creates only Euripus's redo-log checkpoints (RedoLog), and 3) one that only creates redo logs every 1 hour (RedoLog 1h). We assume that all redo logs and a full checkpoint are stored in PCM, and obtain checkpoint-restore times from PCM through simulation (1s for a full checkpoint, 1.5 seconds for minutes-level, and 1.75s for seconds-level checkpoint 5 ). The base error rate of the system was estimated to be 10 −8 from field data [13,20].…”

Section: Error Recoverymentioning

confidence: 99%

“…Creating checkpoints infrequently, e.g. every 1 hour, has the worst efficiency, because the error frequency is higher than the checkpointing one, and the 5 Note that rollback to a incremental redo log starts by restoring the previous full checkpoint. 6 The error rate r i at level i be r i = α · r i−1 , where α ≤ 1 and r total = P l i=0 r i = P l i=0 rα i system cannot effectively recover from an error.…”

Section: Error Recoverymentioning

confidence: 99%

Euripus: A flexible unified hardware memory checkpointing accelerator for bidirectional-debugging and reliability

Doudalis

Prvulovic

2012

2012 39th Annual International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

Bidirectional debugging and error recovery have different goals (programmer productivity and system reliability, respectively), yet they both require the ability to roll-back the program or the system to a past state. This rollback functionality is typically implemented using checkpoints that can restore the system/application to a specific point in time. There are several types of checkpoints, and bidirectional debugging and error-recovery use them in different ways. This paper presents Euripus 1 , a flexible hardware accelerator for memory checkpointing which can create different combinations of checkpoints needed for bidirectional debugging, error recovery, or both. In particular, Euripus is the first hardware technique to provide consolidationfriendly undo-logs (for bidirectional debugging), to allow simultaneous construction of both undo and redo logs, and to support multi-level checkpointing for the needs of errorrecovery. Euripus incurs low performance overheads (<5% on average), improves roll-back latency for bidirectional debugging by >30%, and supports rapid multi-level error recovery that allows >95% system efficiency even with very high error rates.

show abstract

Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems

Cited by 124 publications

References 22 publications

Page placement in hybrid memory systems

Page placement in hybrid memory systems

Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm

Euripus: A flexible unified hardware memory checkpointing accelerator for bidirectional-debugging and reliability

Contact Info

Product

Resources

About