Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles - SOSP '95 1995
DOI: 10.1145/224056.224058
|View full text |Cite
|
Sign up to set email alerts
|

Hypervisor-based fault tolerance

Abstract: Protocols to implement a fault-tolerant computing system are described. These protocols augment the hypervisor of a virtual machine manager to coordinate a primary virtual machine and its backup. The result is a fault-tolerant computing system that does not require modifying the hardware, operating system, or applications programs. Ap rototype system wasc onstructed for HP'sP A-RISC instruction-set architecture. Using this prototype, engineering issues and performance implications of the approach were explored. Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
78
0
1

Year Published

2000
2000
2017
2017

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 191 publications
(80 citation statements)
references
References 21 publications
1
78
0
1
Order By: Relevance
“…To effectively utilize the array bandwidth, the server interleaves (i.e., stripes) each media stream among disks in the array. The unit of data interleaving, referred to as a media block or a stripe unit, denotes the maximum amount of logically contiguous data 3 Several techniques have been proposed which scramble media streams prior to network transmission to enable approximate reconstruction in case of packet losses [10,26]. The efficacy of these techniques validates our claim.…”
Section: Parity-based Reconstructionsupporting
confidence: 65%
See 2 more Smart Citations
“…To effectively utilize the array bandwidth, the server interleaves (i.e., stripes) each media stream among disks in the array. The unit of data interleaving, referred to as a media block or a stripe unit, denotes the maximum amount of logically contiguous data 3 Several techniques have been proposed which scramble media streams prior to network transmission to enable approximate reconstruction in case of packet losses [10,26]. The efficacy of these techniques validates our claim.…”
Section: Parity-based Reconstructionsupporting
confidence: 65%
“…2 -Since the cause of the data loss is irrelevant to the recovery algorithm, the unscrambling algorithms in LRJ and LRM can be adapted to mask packet losses due to network congestion as well. 3 Thus, the IRAD architecture provides an integrated, scalable, end-to-end solution for failure recovery. In case of a disk failure, a redundant array must (1) perform online reconstruction, and thereby provide uninterrupted service to user requests, and (2) rebuild the failed disk onto a spare disk, so that the array can revert back to the normal operating mode.…”
Section: Iradmentioning
confidence: 99%
See 1 more Smart Citation
“…When phase 1 completes, both processors write their data back to the directory (7), (8). At the directory P1 is ordered after P0, so P0's writeback applied (9), while the writeback from P1 is nacked (10). P1 retries the writeback (11), which is then accepted by the directory (12).…”
Section: Mist Complexitymentioning
confidence: 99%
“…They have shown that addressing the problem has the potential to (1) increase software reliability by enhancing software test coverage before release [43], (2) increase system reliability through replication based fault tolerance [9], (3) aid in multithreaded software engineering [42], and (4) enhance security by providing a tool to analyze an attack [13]. Many of these prior proposals either rely on the ability to replay a previously recorded execution [14,20,27,28,31,41,42], incur a performance overhead that is likely too high for always-on usage [5], require complex speculative hardware [11], or only guarantee determinism in well behaved programs [32].…”
Section: Introductionmentioning
confidence: 99%