2014 IEEE 28th International Parallel and Distributed Processing Symposium 2014
DOI: 10.1109/ipdps.2014.123
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the Impact of SDC on the GMRES Iterative Solver

Abstract: Abstract-Increasing parallelism and transistor density, along with increasingly tighter energy and peak power constraints, may force exposure of occasionally incorrect computation or storage to application codes. Silent data corruption (SDC) will likely be infrequent, yet one SDC suffices to make numerical algorithms like iterative linear solvers cease progress towards the correct answer. Thus, we focus on resilience of the iterative linear solver GMRES to a single transient SDC. We derive inexpensive checks t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
67
0
1

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 67 publications
(68 citation statements)
references
References 20 publications
(26 reference statements)
0
67
0
1
Order By: Relevance
“…Bridges et al [30] propose linear solvers to tolerant soft faults using selective reliability. Elliot et al [22] design a fault-tolerant GMRES capable of converging despite silent errors. Bronevetsky and de Supinski [11] provide a comparative study of detection costs for iterative methods.…”
Section: Silent Error Detection and Correctionmentioning
confidence: 99%
“…Bridges et al [30] propose linear solvers to tolerant soft faults using selective reliability. Elliot et al [22] design a fault-tolerant GMRES capable of converging despite silent errors. Bronevetsky and de Supinski [11] provide a comparative study of detection costs for iterative methods.…”
Section: Silent Error Detection and Correctionmentioning
confidence: 99%
“…Bridges et al [34] propose linear solvers to tolerant soft faults using selective reliability. Elliot et al [35] design a fault-tolerant GMRES capable of converging despite latent errors. Bronevetsky and de Supinski [36] provide a comparative study of detection costs for iterative methods.…”
Section: Related Workmentioning
confidence: 99%
“…As mentioned in previous sections, the risk of silent data corruptions is increasing. Several studies have explored the impact of SDCs in execution results [16,43,73]. These studies show that a majority of SDCs leads to noticeable impacts such as crashes and hangs but that only a small fraction of them 5 Accuracy here is defined as the prediction recall: the number of correctly predicted failures divided by the number of actual failures Toward Exascale Resilience: 2014 update actually corrupt the results.…”
Section: Mitigating Silent Data Corruptionsmentioning
confidence: 99%