2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) 2020
DOI: 10.1109/hipc50609.2020.00040
|View full text |Cite
|
Sign up to set email alerts
|

Design and Study of Elastic Recovery in HPC Applications

Abstract: The efficient utilization of current supercomputing systems with deep storage hierarchies demands scientific applications that are capable of leveraging such heterogeneous hardware. Fault tolerance, and checkpointing in particular, is one of the most time-consuming aspects if not handled correctly. High checkpoint performance can be achieved using optimized multilevel checkpoint and restart libraries. Unfortunately, those libraries do not allow for restarts with a modified number of processes or scientific pos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 29 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?