1991
DOI: 10.1109/2.84898
|View full text |Cite
|
Sign up to set email alerts
|

High-availability computer systems

Abstract: Today's highly available systems deliver four years of uninterrupted service.The challenge is to build systems with 100-year mean time to failure and one-minute repair times. September 1991aradoxically, the larger a system is, the more critical -but less likelyit is to be highly available. We can build small ultra-available modules, but building large systems involving thousands of modules and millions of lines of code is a poorly understood art, even though such large systems are a core technology of modern s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
102
0
9

Year Published

1997
1997
2022
2022

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 260 publications
(115 citation statements)
references
References 7 publications
0
102
0
9
Order By: Relevance
“…In other words, since bug fixing is not always possible for continuously running software systems, patch release is becoming much popular to maintain software applications in operational phase. However, as reported in Gray and Siewiorek [22] and Grottke and Trivedi [23], most software failures are transient in nature and will disappear if the operation is retried later in slightly different context. These software bugs which cause transient failure are called Mandelbugs, and it is in general difficult to characterize their root origin.…”
Section: Introductionmentioning
confidence: 91%
“…In other words, since bug fixing is not always possible for continuously running software systems, patch release is becoming much popular to maintain software applications in operational phase. However, as reported in Gray and Siewiorek [22] and Grottke and Trivedi [23], most software failures are transient in nature and will disappear if the operation is retried later in slightly different context. These software bugs which cause transient failure are called Mandelbugs, and it is in general difficult to characterize their root origin.…”
Section: Introductionmentioning
confidence: 91%
“…Further, few researchers has used computer systems to detect the errors which results in the failure of operating system with the help of comparators coding. On the same side we had observed the conditions over which the simple comparator algorithm is compatible or not [3]. Moreover other scientist had performed experiment to use one of the input output standard to stimulate particular type of sensor.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, past studies have shown that most hardware and software errors are transient, recoverable failures [29]. As such, the Rover toolkit model balances the need to hide hardware and software faults with the need to avoid overly burdening programmers and lowering performance in the normal case.…”
Section: Failuresmentioning
confidence: 99%
“…There is a large body of research on logging and distributed fault-tolerant transactions; for an excellent discussion of some of the issues, see [28] and [29].…”
Section: Reliable Executionmentioning
confidence: 99%