2014
DOI: 10.1109/tns.2014.2362014
|View full text |Cite
|
Sign up to set email alerts
|

Modern GPUs Radiation Sensitivity Evaluation and Mitigation Through Duplication With Comparison

Abstract: Graphics processing units (GPUs) are increasingly common in both safety-critical and high-performance computing (HPC) applications. Some current supercomputers are composed of thousands of GPUs so the probability of device corruption becomes very high. Moreover, the GPU's parallel capabilities are very attractive for the automotive and aerospace markets, where reliability is a serious concern. In this paper, the neutron sensitivity of the modern GPU caches, and internal resources are experimentally evaluated. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 36 publications
(15 citation statements)
references
References 20 publications
(29 reference statements)
0
15
0
Order By: Relevance
“…e efficiency of the proposed strategies was experimentally evaluated and compared with chip's ECC protection mechanism. It was demonstrated that DWC strategies can be more effective than ECC when input data are duplicated [13].…”
Section: Related Workmentioning
confidence: 99%
“…e efficiency of the proposed strategies was experimentally evaluated and compared with chip's ECC protection mechanism. It was demonstrated that DWC strategies can be more effective than ECC when input data are duplicated [13].…”
Section: Related Workmentioning
confidence: 99%
“…Reference [10] presents the radiation sensitivity evaluation of a modern Graphic Processing Units (GPUs) designed in 28nm technology node, and composed by an array of streaming multi-processors which share the L2 cache memory. It also provides a hardening strategy based on Duplication with Comparison.…”
Section: Related Workmentioning
confidence: 99%
“…As Time does not duplicate resources like caches, some SDC may still affect the output. For a more detailed explanation and details about implementation of DWC strategies in GPUs please refer to [11] The Checkpoint-rollback strategy imposes an overhead from 5% to 15% of execution time [6] or even higher depending on checkpoint frequency. The Checkpoint-rollback strategy cannot be tested under radiation due to the artificially high errors frequency.…”
Section: Software-based Hardeningmentioning
confidence: 99%
“…All errors with Spatial were detected and corrected, while in E-O Spatial errors in shared resources can pass undetected as the redundant block can be executed in the same SM. For a detailed discussion, please refer to [11].…”
Section: Duplication With Comparisonmentioning
confidence: 99%