Abstract-This paper presents the results of a n empirical study of software e r r o r detection using self checks and N-version voting. A total of 24 graduate students in computer science at the University of Virginia and the University of California, Irvine, were hired a s programmers. Working independently, each first prepared a set of self checks using just the requirements specification of a n aerospace application, and then each added self checks to a n existing implementation of that specification. The modified programs were executed to measure the error-detection performance of the checks and to compare this with err o r detection using simple voting among multiple versions.The goal of this study w'as to learn more about the effectiveness of such checks. The analysis of the checks revealed that there a r e great differences in the ability of individual programmers to design effective checks. We found that some checks that might have been effective failed to detect a n e r r o r because they were badly placed, a n d there were numerous instances of checks signaling nonexistent errors. In general, specification-based checks alone were not a s effective a s combining them with code-based checks. Using self checks, faults were identified that had not been detected previously by voting 28 versions of the program over a million randomly-generated inputs. This appeared to result from the fact that the self checks could examine the internal state of the executing program whereas voting examines only final results of computations. If internal states had to be identical in N-version voting systems, then there would be no reason to write multiple versions.The programs were executed on 100 000 new randomly-generated input cases in order to compare e r r o r detection by self checks and by 2-version and 3-version voting. Both self checks and voting techniques led to the identification of the same number of faults for this input, although the identified faults were not the same. Furthermore, whereas the self checks were always effective at detecting a n e r r o r caused by a particular fault (if they ever did), N-version voting triples a n d pairs were only partially effective at detecting the failures caused by particular faults. Finally, checking the internal state with self checks also resulted in finding faults that did not cause failures for the particular input cases executed. This has important implications for the use of back-to-back testing.Index Terms-Acceptance tests, assertions, e r r o r detection, N-version programming, software fault tolerance, software reliability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.