2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 2022
DOI: 10.1109/sbac-pad55451.2022.00037
|View full text |Cite
|
Sign up to set email alerts
|

Strategies for Fault-Tolerant Tightly-Coupled HPC Workloads Running on Low-Budget Spot Cloud Infrastructures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 17 publications
0
10
0
Order By: Relevance
“…23 We integrated into HPC@Cloud existing fault tolerance strategies to address the unreliability of spot instances. 6 These strategies are built upon existing technologies such as BLCR and the innovative ULFM, which is currently under development by the MPI Forum. Although numerous studies have explored fault tolerance using BLCR and other well-established technologies, they often do not comprehensively analyze the advantages and disadvantages of utilizing more affordable, less reliable cloud instances compared to high-end options in a public cloud and HPC context.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…23 We integrated into HPC@Cloud existing fault tolerance strategies to address the unreliability of spot instances. 6 These strategies are built upon existing technologies such as BLCR and the innovative ULFM, which is currently under development by the MPI Forum. Although numerous studies have explored fault tolerance using BLCR and other well-established technologies, they often do not comprehensively analyze the advantages and disadvantages of utilizing more affordable, less reliable cloud instances compared to high-end options in a public cloud and HPC context.…”
Section: Related Workmentioning
confidence: 99%
“…We integrated into HPC@Cloud existing fault tolerance strategies to address the unreliability of spot instances 6 . These strategies are built upon existing technologies such as BLCR and the innovative ULFM, which is currently under development by the MPI Forum.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations