& CONCLUSIONSIn many computer systems, the contents of memory are protected by an error detection and correction (EDAC) code. Bit-flips caused by single event upsets (SEUs) are a well-known problem in memory chips and EDAC codes have been an effective solution to this problem. These codes are usually implemented in hardware using extra memory bits and encoding/decoding circuitry. In systems where EDAC hardware is not available, the reliability of the system can be improved by providing protection through software. Codes and techniques that can be used for software implementation of EDAC are discussed and compared. objective of our experiment is to see whether software-implemented hardware faulttolerance -which can include software-implemented EDAC -can provide sufficient reliability for COTS hardware to make it usable in low-radiation space applications.Power fluctuation and electromagnetic interference may cause bit-flips in memories. It has been observed that radiation-induced transient errors also occur at ground level [8]. Therefore, the technique presented in this paper can be useful for terrestrial applications, too.Previous discussions of software-implemented EDAC concentrate on communications and secondary storage systems [9][10][11][12][13][14]. In Sec. 2, we review some of these previous studies. In Sec. 3, we look at the problem in more detail and discuss the requirements of a scheme for the particular application presented above. Four different example EDAC coding schemes were implemented in software. These schemes are compared in Sec. 4. Issues that have to be considered for handling multiple errors and solutions to them are discussed in Sec. 5. Finally, the EDAC program has to be integrated into the whole system. We present our implementation in ARGOS in Sec. 6.The reliability improvement of an application in a space environment is estimated in Sec. 7. We conclude the paper with a discussion in Sec. 8.
ÐRedundant systems are designed using multiple copies of the same resource (e.g., a logic network or a software module) in order to increase system dependability. Design diversity has long been used to protect redundant systems from common-mode failures. The conventional notion of diversity relies on ªindependentº generation of ªdifferentº implementations. This concept is qualitative and does not provide a basis for comparing the reliabilities of two diverse systems. In this paper, for the first time, we present a metric to quantify diversity among several designs and illustrate its effectiveness using several examples. Applications of this metric in analyzing reliability and availability of diverse redundant systems, and deriving simple relationships between diversity, system failure rate, and mission time are also demonstrated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.