2016 IEEE International Conference on Cluster Computing (CLUSTER) 2016
DOI: 10.1109/cluster.2016.99
|View full text |Cite
|
Sign up to set email alerts
|

TwinPCG: Dual Thread Redundancy with Forward Recovery for Preconditioned Conjugate Gradient Methods

Abstract: Even though iterative solvers like the Conjugate Gradients method (CG) have been studied for over fifty years, fault tolerance for such solvers has seen much attention in recent years. For iterative solvers, two major reliable strategies of recovery exist: checkpoint-restart for backward recovery, or some type of redundancy technique for forward recovery. Important redundancy techniques like ABFT techniques for sparse matrixvector products (SpMxV) have recently been proposed, which increase the resilience of C… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…Bit ips are introduced at random times and positions, and the e ects are classi ed according to the resulting runtimes and solution errors. Dichev and Nikolopoulos [13] propose and experimentally evaluate a speci c form of dual modular redundancy, where all computations are performed twice for improved redundancy, in order to detect and correct soft errors in the PCG method. More work in the area of soft error detection and correction in the CG and PCG methods has been published by Shantharam et al [28] and Fasi et al [15].…”
Section: Related Workmentioning
confidence: 99%
“…Bit ips are introduced at random times and positions, and the e ects are classi ed according to the resulting runtimes and solution errors. Dichev and Nikolopoulos [13] propose and experimentally evaluate a speci c form of dual modular redundancy, where all computations are performed twice for improved redundancy, in order to detect and correct soft errors in the PCG method. More work in the area of soft error detection and correction in the CG and PCG methods has been published by Shantharam et al [28] and Fasi et al [15].…”
Section: Related Workmentioning
confidence: 99%
“…Bosilca et al [8,9,34] suggest an algorithm for integrating algorithm-specific with general-purpose faulttolerance techniques. While Pachajoa and Gansterer [44] evaluate the inherent resilience properties of CG after a node failure, others discuss the related but independent problem of soft errors in CG [1,10,27,30,50,51].…”
Section: Introductionmentioning
confidence: 99%