2004
DOI: 10.1016/j.peva.2003.07.007
|View full text |Cite
|
Sign up to set email alerts
|

Improving availability with recursive microreboots: a soft-state system case study

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
46
0

Year Published

2006
2006
2021
2021

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 71 publications
(46 citation statements)
references
References 50 publications
0
46
0
Order By: Relevance
“…All runtime data will be collected. [4] Result analysis: After acquiring reliability evaluation results, system operators, application developers and middleware vendors can study the result carefully to find out the best-of-the-breed fault tolerant configuration, select another middleware product or redesign some parts of the system. The second and the third steps are iterative.…”
Section: Approach Overviewmentioning
confidence: 99%
See 1 more Smart Citation
“…All runtime data will be collected. [4] Result analysis: After acquiring reliability evaluation results, system operators, application developers and middleware vendors can study the result carefully to find out the best-of-the-breed fault tolerant configuration, select another middleware product or redesign some parts of the system. The second and the third steps are iterative.…”
Section: Approach Overviewmentioning
confidence: 99%
“…In Java, for example, "when a program violates the semantic constraints of the Java programming language, the Java virtual machine signals this error to the program as an exception" [24]. Some experiments in [4] show that almost all underlying faults can be manifested by Java exceptions. At the same time, in terms of the philosophy of exception handling, throwing and catching exceptions become the most popular mechanisms to deal with faults when programming middleware products and applications.…”
Section: Fault Modelmentioning
confidence: 99%
“…The possibility of returning to the initial state at any time is common in reactive systems to prevent resource leaks in long-running executions, for instance through micro-reboots [2]. Communication protocols in the IP and telephony communities regulate the coexistence of interacting features.…”
Section: Well-architected Systemsmentioning
confidence: 99%
“…In this section, we discuss our application of recursive restarts (RR), 41 a recently proposed technique for achieving high availability that exploits partial restarts at various levels within complex software infrastructures to recover from transient failures and rejuvenate software components such that overall mean-time-to-recover (MTTR) is minimized. 6 We had two main goals in applying RR to Mercury. The first was to partially remove the human from the loop in ground station control by automating recovery from common transient failures that were restart curable.…”
Section: B High Availability Upgradesmentioning
confidence: 99%
“…GSML has enabled independent development of commodity ground station services and a framework for extending these services to commodity satellite operations tasks. 22,31 Our application of recursive restarts, 6 a ROC high-availability technique, has reduced recovery time by a factor of four in Mercury systems and eliminated the adverse effects of many ground station software failure modes. Mercury support of a scientific small satellite mission, QuakeSat-1, 14,24 has provided low-cost flexible satellite contact opportunities that currently do not exist among commercial providers.…”
Section: Introductionmentioning
confidence: 99%