The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) 2019
DOI: 10.1109/ccgrid.2019.00015
|View full text |Cite
|
Sign up to set email alerts
|

Application-Level Differential Checkpointing for HPC Applications with Dynamic Datasets

Abstract: High-performance computing (HPC) requires resilience techniques such as checkpointing in order to tolerate failures in supercomputers. As the number of nodes and memory in supercomputers keeps on increasing, the size of checkpoint data also increases dramatically, sometimes causing an I/O bottleneck. Differential checkpointing (dCP) aims to minimize the checkpointing overhead by only writing data differences. This is typically implemented at the memory page level, sometimes complemented with hashing algorithms… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 14 publications
0
7
0
1
Order By: Relevance
“…Resilient checkpointing has been considered with the help of nonvolatile memory, as for instance implemented in PapyrusKV (Kim et al, 2017), a resilient key-value blob-storage. Other resilient checkpointing techniques include the self-checkpoint technique (Tang et al, 2018), which reduces common redundancies while writing checkpoints, or techniques reducing the amount of required memory through hierarchical checkpointing (Moody et al, 2010), or differential checkpointing (Keller and Bautista-Gomez, 2019).…”
Section: System Infrastructure Techniques For Resiliencementioning
confidence: 99%
“…Resilient checkpointing has been considered with the help of nonvolatile memory, as for instance implemented in PapyrusKV (Kim et al, 2017), a resilient key-value blob-storage. Other resilient checkpointing techniques include the self-checkpoint technique (Tang et al, 2018), which reduces common redundancies while writing checkpoints, or techniques reducing the amount of required memory through hierarchical checkpointing (Moody et al, 2010), or differential checkpointing (Keller and Bautista-Gomez, 2019).…”
Section: System Infrastructure Techniques For Resiliencementioning
confidence: 99%
“…In essence during the checkpoint procedure one needs to store only the data that have changed their value from the previous checkpoint. This technique is called differential checkpoint [18], and can be implemented in two ways. The first one, uses the page dirty bits 2 of the Operating System (OS) to detect which pages has changed from the previous checkpoint.…”
Section: Differential Checkpointmentioning
confidence: 99%
“…A number of optimizations have been proposed to reduce the amount of data to be stored. Differential approaches [10], [11], [18] update only the data that has changed in comparison to the previous checkpoint and are beneficial only when applications do not change substantially. In contrast to these works, our work focuses on the programmability aspect of application initiated multilevel C/R and optimizes the normal checkpoint procedure.…”
Section: Related Workmentioning
confidence: 99%
“…Hasta ahora, las soluciones que hemos aportado en este trabajo se basan en la utilización de checkpoints coordinados de nivel de sistema (DM T CP ) o no-coordinados (por proceso, en capa de aplicación). Sin embargo, existen también otras variantes, como los checkpoints semicoordinados o el checkpointing diferencial (un método que reduce la carga de Entrada/Salida de realizar checkpoints consecutivos actualizando sólo aquellos bloques de datos que han cambiado desde el almacenamiento del último checkpoint [75]), que no hemos tenido en cuenta hasta el momento.…”
Section: Trabajos Futurosunclassified