2008
DOI: 10.1145/1353534.1346315
|View full text |Cite
|
Sign up to set email alerts
|

Understanding the propagation of hard errors to software and implications for resilient system design

Abstract: With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field faults. To be broadly deployable, the hardware reliability solution must incur low overheads, precluding use of expensive redundancy. We explore a cooperative hardware-software solution that watches for anomalous software behavior to indicate the presence of hardware faults. Fundamental to such a solution is a characterization of how hardware faults in different microarchitectural structures of a modern process… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
126
1

Year Published

2009
2009
2018
2018

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 70 publications
(127 citation statements)
references
References 49 publications
0
126
1
Order By: Relevance
“…We inject faults under µarch-level stuck-at, gate-level stuck-at, and gate-level delay fault models, and use the previously studied SWAT detection techniques to understand their system-level manifestation [12]. We show that, in general, µarch-level stuck-at faults do not result in similar system-level fault manifestation as gate-level stuckat or delay faults.…”
Section: Contributionsmentioning
confidence: 99%
See 4 more Smart Citations
“…We inject faults under µarch-level stuck-at, gate-level stuck-at, and gate-level delay fault models, and use the previously studied SWAT detection techniques to understand their system-level manifestation [12]. We show that, in general, µarch-level stuck-at faults do not result in similar system-level fault manifestation as gate-level stuckat or delay faults.…”
Section: Contributionsmentioning
confidence: 99%
“…For this purpose, we use the SWAT symptom-based detection scheme because these detectors essentially capture how hardware faults manifest into the system level and software [12].…”
Section: Studying System-level Effectsmentioning
confidence: 99%
See 3 more Smart Citations