The initial design for a distributed, fault-tolerant version of UNIX based on three-way atomic message transmission was presented in an earlier paper [3]. The implementation effort then moved from Auragen Systems' to Nixdorf Computer where it was completed. This paper describes the working system, now known as the TARGON/32.The original design left open questions in at least two areas: fault tolerance for server processes and recovery after a crash were briefly and inaccurately sketched, rebackup after recovery was not discussed at all. The fundamental design involving three-way message transmission has remained unchanged. However, in addition to important changes in the implementation, server backup has been redesigned and is now more consistent with that of normal user processes. Recovery and rebackup have been completed in a less centralized and thus more efficient manner than previously envisioned.In this paper we review important aspects of the original design and note how the implementation differs from our original ideas. We then focus on the backup and recovery for server processes and the changes and additions in the design and implementation of recovery and rebackup.
A simple and general design uses message-based communication to provide software tolerance of singlepoint hardware failures. By delivering all interprocess messages to inactive backups for both the sender and the destination, both backups are kept in a state in which they can take over for their primaries.An implementation for the Auragen 4000 series of M68000-based systems is described. The operating system, Autos TM, is a distributed version of UNIX *. Major goals have been transparency of fault tolerance and efficient execution in the absence of failure. In~oductionThis paper describes the design and implementation of message-based interprocess communication to support fault tolerant computing in an on-line transaction processing environment. The system assures that all executing processes will survive any single hardware failure. The scheme works efficiently and automatically; little processing overhead is incurred and no programmer or user awareness is required for fault tolerant operation. A simple and general design is presented in the first half of the paper. After that, we describe the details of our implementation which is embedded in a distributed version of UNIX running on the Auragen 4000 computer.Section 2 reviews some existing methods for implementing fault tolerance. Section 3 describes the goals and scope of our work. In Sections 4, 5, and 6, we introduce the design and algorithms on which the Auragen Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.© 1983 ACM 0-89791-115-6/83/010/0090 $00.75 message system implementation is based. Section 7 describes our implementation, first summarizing the hardware design and then detailing the software supporting fault tolerance. In Section 8, we review the implementation with an eye toward prediction of the efficiency of the system.
This paper presents the results of a simulation-based study of various translation lookaside buffer (TLB) archhectures, in the context of a modem VLSI RISC processor. The simulators used address traces, generated by instrumented versions of the SPECmarks and several other programs rtmning on a DECstation 5000. The performance of two-level TLBs and fully-associative TLBs were investigated. The amount of memory mapped was found to be the dominant factor in TLB performance. Small first-level FIFO instruction TLBs can be effective in two level TLB configurations.For some applications, the cycles-per-instruction (CPI) loss due to TLB misses can be reduced from as much as 5 CPI to negligible levels with typical TLB parameters through the use of variable-sized pages.
A simple and general design uses message-based communication to provide software tolerance of single-point hardware failures. By delivering all interprocess messages to inactive backups for both the sender and the destination, both backups are kept in a state in which they can take over for their primaries. An implementation for the Auragen 4000 series of M68000-based systems is described. The operating system, Auros TM , is a distributed version of UNIX*. Major goals have been transparency of fault tolerance and efficient execution in the absence of failure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.