2016
DOI: 10.1109/tc.2015.2444855
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation and Mitigation of Radiation-Induced Soft Errors in Graphics Processing Units

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
58
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 85 publications
(75 citation statements)
references
References 29 publications
0
58
0
Order By: Relevance
“…The main causes of transient errors are voltage-frequency variations, temperature perturbations, and electromagnetic interferences. Lately, neutron-induced errors have been shown to be critical for HPC systems [15,46]. A flux of about 13 neutrons/((cm 2 ) × h) reaches ground at sea level, and the flux exponentially increases with altitude [29].…”
Section: Background 21 Transient Errors Effects In Hpcmentioning
confidence: 99%
See 1 more Smart Citation
“…The main causes of transient errors are voltage-frequency variations, temperature perturbations, and electromagnetic interferences. Lately, neutron-induced errors have been shown to be critical for HPC systems [15,46]. A flux of about 13 neutrons/((cm 2 ) × h) reaches ground at sea level, and the flux exponentially increases with altitude [29].…”
Section: Background 21 Transient Errors Effects In Hpcmentioning
confidence: 99%
“…Highly parallel computing architectures, like the Xeon Phi, have some reliability weaknesses [15,16,21,49]. For instance, a single particle generating a radiation-induced failure in the scheduler or shared memories (used to expedite parallel executions), is likely to affect the computation of several parallel threads.…”
Section: Background 21 Transient Errors Effects In Hpcmentioning
confidence: 99%
“…It is worth noting that a significant component of Crashes is caused by radiation corruption of the device control circuitry. Errors affecting instruction memory, the GPU hardware schedulers, or the CPU-GPU interface could lead to application crash or system hang, independently of the algorithm proprieties [de Oliveira et al 2016]. Crashes, while more frequent than SDCs in HOG, are considered less critical as they are easily detected [Li et al 2008;Nakka et al 2005;Pattabiraman et al 2006].…”
Section: Crashmentioning
confidence: 99%
“…While extremely efficient in terms of FLOP/s (floating-point operations per second) and FLOPs-per-WATT, modern GPUs have been shown to be prone to experience radiation-induced corruption [DeBardeleben et al 2013;Wunderlich et al 2013;Gomez et al 2014;Oliveira et al 2014;de Oliveira et al 2016]. GPU architecture may be particularly susceptible to be corrupted by radiation for three main reasons.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation