Proceedings 1999 Pacific Rim International Symposium on Dependable Computing
DOI: 10.1109/prdc.1999.816227
|View full text |Cite
|
Sign up to set email alerts
|

Networked Windows NT system field failure data analysis

Abstract: This paper presents a measurement-based dependability study of a Networked Windows NT system based on field data collected from NT System Logs from 503 servers running in a production environment over a four-month period. The event logs at hand contains only system reboot information. We study individual server failures and domain behavior in order to characterize failure behavior and explore error propagation between servers. The key observations from this study are: (1) system software and hardware failures … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 34 publications
(1 citation statement)
references
References 17 publications
0
1
0
Order By: Relevance
“…Software faults which manifest permanently, also known as Bohrbugs, are likely to fix and discover during the pre-operational phases of system life cycle (e.g., structured design, design review, quality assurance, unit, component and integration testing, alpha/beta test), as well as by means of traditional debugging techniques. Conversely, software faults which manifest transiently, also known as Heisenbugs, cannot be reproduced systematically (Huang, Jalote, & Kintala, 1994), and they have been demonstrated to be the major cause of failures in software systems, especially during the system operational phase (Sullivan & Chillarege, 1991;Chillarege, Biyani, & Rosenthal,1995;Xu, Kalbarczyc, & Iyer, 1999).…”
Section: Introductionmentioning
confidence: 99%
“…Software faults which manifest permanently, also known as Bohrbugs, are likely to fix and discover during the pre-operational phases of system life cycle (e.g., structured design, design review, quality assurance, unit, component and integration testing, alpha/beta test), as well as by means of traditional debugging techniques. Conversely, software faults which manifest transiently, also known as Heisenbugs, cannot be reproduced systematically (Huang, Jalote, & Kintala, 1994), and they have been demonstrated to be the major cause of failures in software systems, especially during the system operational phase (Sullivan & Chillarege, 1991;Chillarege, Biyani, & Rosenthal,1995;Xu, Kalbarczyc, & Iyer, 1999).…”
Section: Introductionmentioning
confidence: 99%