Networked Windows NT system field failure data analysis

Xu, Jinli; Kalbarczyk, Zbigniew; Iyer, Ravishankar K.

doi:10.1109/prdc.1999.816227

Cited by 34 publications

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Software faults which manifest permanently, also known as Bohrbugs, are likely to fix and discover during the pre-operational phases of system life cycle (e.g., structured design, design review, quality assurance, unit, component and integration testing, alpha/beta test), as well as by means of traditional debugging techniques. Conversely, software faults which manifest transiently, also known as Heisenbugs, cannot be reproduced systematically (Huang, Jalote, & Kintala, 1994), and they have been demonstrated to be the major cause of failures in software systems, especially during the system operational phase (Sullivan & Chillarege, 1991;Chillarege, Biyani, & Rosenthal,1995;Xu, Kalbarczyc, & Iyer, 1999).…”

Section: Introductionmentioning

confidence: 99%

A Recovery-Oriented Approach for Software Fault Diagnosis in Complex Critical Systems

Carrozza

Natella

2011

International Journal of Adaptive, Resilient and Autonomic Systems

View full text Add to dashboard Cite

This paper proposes an approach to software faults diagnosis in complex fault tolerant systems, encompassing the phases of error detection, fault location, and system recovery. Errors are detected in the first phase, exploiting the operating system support. Faults are identified during the location phase, adopting on a machine learning approach; this phase then triggers the proper recovery action for the occurred fault - actuated in the third phase. Feedback actions are also adopted in the location phase to improve detection quality over time. A real world application from the Air Traffic Control field has been used as case study for evaluating the proposed approach. Experimental results, achieved by means of fault injection, show that the diagnosis engine is able to diagnose faults with high accuracy and at a low overhead

show abstract

Section: Introductionmentioning

confidence: 99%