Society relies on telecommunications to such an extent that telecommunications software must have high reliability. Enhanced measurement for early risk assessment of latent defects (EMERALD) is a joint project of Nortel and Bell Canada for improving the reliability of telecommunications software products. This paper reports a case study of neural-network modeling techniques developed for the EMERALD system. The resulting neural network is currently in the prototype testing phase at Nortel. Neural-network models can be used to identify fault-prone modules for extra attention early in development, and thus reduce the risk of operational problems with those modules. We modeled a subset of modules representing over seven million lines of code from a very large telecommunications software system. The set consisted of those modules reused with changes from the previous release. The dependent variable was membership in the class of fault-prone modules. The independent variables were principal components of nine measures of software design attributes. We compared the neural-network model with a nonparametric discriminant model and found the neural-network model had better predictive accuracy.
Software faults are defects in software modules that might cause failures. Software developers tend to focus on faults, because they are closely related to the amount of rework necessary to prevent future operational software failures. The goal of this paper is to predict which modules are fault-prone and to do it early enough in the life cycle to be useful to developers. A regression tree is an algorithm represented by an abstract tree, where the response variable is a real quantity. Software modules are classified as fault-prone or not, by comparing the predicted value to a threshold. A classification rule is proposed that allows one to choose a preferred balance between the two types of misclassification rates. A case study of a very large telecommunications systems considered software modules to be fault-prone if any faults were discovered by customers. Our research shows that classifying fault-prone modules with regression trees and the using the classification rule in this paper, resulted in predictions with satisfactory accuracy and robustness.
Socaety has become so dependent on relaable telecommunacataons, that faalures can rask loss of emergency servace, busaness dasruptaons, or asolataon from fraends. Consequently, telecommunacataons software as requared t o have hagh relaabalaty. Many prevaous studaes define the classaficataon fault-prone an terms of fault counts. Thas study defines fault-prone as exceedang a threshold of debug code churn, defined as the number of lanes added or changed due t o bug fixes. Prevaous studaes have characterazed reuse hastory wath sample categoraes. Thas study quantafied new functaonalaty wath lanes of code. Thas paper analyzes two consecutave releases of a large legacy software system f o r telecommunacataons. W e applied dascrimanant analysas t o adentafy fault-prone modules based on 16 statac software product metracs and the amount of code changed durang development. Modules f r o m one release were used as a fit data set and modules f r o m the subsequent release were used as a test data set. I n contrast, comparable praor studaes of legacy systems splat the data t o szmulate two releases. W e valadated the model wath a realastac samulataon of utalazataon of the fitted model wath the test data set. Model results could be used t o gave extra attentzon t o fault-prone modules and thus, reduce the rask of unexpected problems.
Reliable software is mandatory for complex mission-critical systems. Classifying modules as fault-prone, or not, is a valuable technique for guiding development processes, so that resources can be focused on those parts of a system that are most likely to have faults.Logistic regression offers advantages over other classification modeling techniques, such as interpretable coefficients. There are few prior applications of logistic regression to software quality models in the literature, and none that we know of account for prior probabilities and costs of misclassification. A contribution of this paper is the application of prior probabilities and costs of misclassification to a logistic regressionbased classification rule for a software quality model. This paper also contributes an integrated method for using logistic regression in software quality modeling, including examples of how to interpret coefficients, how to use prior probabilities, and how to use costs of misclassifications. A case study of a major subsystem of a military, real-time system illustrates the techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.