2020
DOI: 10.1016/j.microrel.2020.113856
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the soft error sensitivity of a GPU-based SoC for matrix multiplication

Abstract: System-on-Chip (SoC) devices can be composed of low-power multicore processors combined with a small graphics accelerator (or GPU) which offers a trade-off between computational capacity and low-power consumption. In this work we use the LLFI-GPU fault injection tool on one of these devices to compare the sensitivity to soft errors of two different CUDA versions of matrix multiplication benchmark. Specifically, we perform fault injection campaigns on a Jetson TK1 development kit, a board equipped with a SoC in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 16 publications
(26 reference statements)
0
5
0
Order By: Relevance
“…Since the injection is purely software, the execution can be performed on the live hardware, significantly reducing the injection overhead (one fault injection takes as much time as a normal execution) and the fault injection design (no need to simulate and control the circuit). The ease of use and efficiency of software fault injection makes it popular among research groups evaluating the reliability of neural networks [62], [69], [70], [101], [161]- [163]. However, the very high level of abstraction at which software fault injection is performed imposes a need to carefully select the error model.…”
Section: Software Fault Injectionmentioning
confidence: 99%
“…Since the injection is purely software, the execution can be performed on the live hardware, significantly reducing the injection overhead (one fault injection takes as much time as a normal execution) and the fault injection design (no need to simulate and control the circuit). The ease of use and efficiency of software fault injection makes it popular among research groups evaluating the reliability of neural networks [62], [69], [70], [101], [161]- [163]. However, the very high level of abstraction at which software fault injection is performed imposes a need to carefully select the error model.…”
Section: Software Fault Injectionmentioning
confidence: 99%
“…An alternative method to analyze the vulnerability of different components of the GPU is to inject faults at the architecture-level or into compiler-level intermediate representations [38], [39]. In [40] we employed the LLFI injection tool to evaluate the soft error sensitivity of the algorithms Elem and Block in the same Jeston TK1 used in this paper. We injected single bit-flips in the results of the instructions of randomly chosen threads.…”
Section: B Types Of Errorsmentioning
confidence: 99%
“…Previous studies have stated that parallel architectures, particularly GPUs, have a high fault rate because of the high amount of available resources [7], [12], [13]. Recent works have identified some peculiar reliability weaknesses of GPUs architecture, suspecting that the corruption of the GPU hardware scheduler or shared memories can severely impact the computation of several parallel threads [3], [7], [12], [14], [15]. As a result, multiple GPU output elements can potentially be corrupted, effectively undermining several applications' reliability, including CNNs [16], [17].…”
Section: Radiation Induced Sdcs and Dues In Gpusmentioning
confidence: 99%
“…Few works have studied the DUEs on GPUs on beam experiments [3], [10], [18]- [20]. However, most of the mentioned works do not present a detailed analysis of the events that cause DUEs on GPUs, not allowing a deep investigation of the weakness of the GPUs.…”
Section: Radiation Induced Sdcs and Dues In Gpusmentioning
confidence: 99%
See 1 more Smart Citation