SC18: International Conference for High Performance Computing, Networking, Storage and Analysis 2018
DOI: 10.1109/sc.2018.00072
|View full text |Cite
|
Sign up to set email alerts
|

PRISM: Predicting Resilience of GPU Applications Using Statistical Methods

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(10 citation statements)
references
References 33 publications
0
10
0
Order By: Relevance
“…Recently, there have been some works that propose machine-learning techniques to predict soft errors in GPU programs [23,37]. The proposed prediction frameworks utilize the program characteristics as input features in their ML models, similar to works based on failure prediction for HPC systems [21].…”
Section: Correlating Code Characteristics and Fault Behaviormentioning
confidence: 99%
“…Recently, there have been some works that propose machine-learning techniques to predict soft errors in GPU programs [23,37]. The proposed prediction frameworks utilize the program characteristics as input features in their ML models, similar to works based on failure prediction for HPC systems [21].…”
Section: Correlating Code Characteristics and Fault Behaviormentioning
confidence: 99%
“…Recently, there have been many efforts to utilize machine learning [13], [35], [36], [37] to address resilience problems. IPAS [37] uses machine learning to decide on instructions that will likely to lead to corruption and duplicates them.…”
Section: Related Workmentioning
confidence: 99%
“…Minotaur can potentially be applicable to other hardware platforms as well. Although this work focuses on CPUs, recent resiliency analyses on GPUs [55,62,83], for example, can potentially benefit from the concepts of Minotaur to improve runtime and/or accuracy. Approximate computing: Many techniques have been proposed that leverage approximate computing at the level of software [8,11,23,71,85,100,106,110,114,121,125,127], programming languages [15,20,74,86,87,101,102] and hardware [5,11,16,33,42,46,53,56,77,103,105,112,126,128,133] for improved performance, energy, or reliability.…”
Section: Related Workmentioning
confidence: 99%