Reliability and validity in comparative studies of software prediction models

Myrtveit, Ingunn; Stensrud, Erik; Shepperd, Martin

doi:10.1109/tse.2005.58

Cited by 207 publications

(174 citation statements)

References 37 publications

Supporting

Mentioning

167

Contrasting

Unclassified

Order By: Relevance

“…First, no single prediction technique dominates [3] and, second, making sense of the many prediction results is hampered by the use of different data sets, data pre-processing, validation schemes and performance statistics [4], [3], [5], [6]. These differences are compounded by the lack of any agreed reporting protocols or even the need to share code and algorithms [7].…”

Section: Introductionmentioning

confidence: 99%

Researcher Bias: The Use of Machine Learning in Software Defect Prediction

Shepperd

Bowes

Hall

2014

IIEEE Trans. Software Eng.

305

191

View full text Add to dashboard Cite

Abstract-Background. The ability to predict defect-prone software components would be valuable. Consequently, there have been many empirical studies to evaluate the performance of different techniques endeavouring to accomplish this effectively. However no one technique dominates and so designing a reliable defect prediction model remains problematic. Objective. We seek to make sense of the many conflicting experimental results and understand which factors have the largest effect on predictive performance. Method. We conduct a meta-analysis of all relevant, high quality primary studies of defect prediction to determine what factors influence predictive performance. This is based on 42 primary studies that satisfy our inclusion criteria that collectively report 600 sets of empirical prediction results. By reverse engineering a common response variable we build a random effects ANOVA model to examine the relative contribution of four model building factors (classifier, data set, input metrics and researcher group) to model prediction performance. Results. Surprisingly we find that the choice of classifier has little impact upon performance (1.3%) and in contrast the major (31%) explanatory factor is the researcher group. It matters more who does the work than what is done. Conclusion. To overcome this high level of researcher bias, defect prediction researchers should (i) conduct blind analysis, (ii) improve reporting protocols and (iii) conduct more intergroup studies in order to alleviate expertise issues. Lastly, research is required to determine whether this bias is prevalent in other applications domains.

show abstract

Section: Introductionmentioning

confidence: 99%

Researcher Bias: The Use of Machine Learning in Software Defect Prediction

Shepperd

Bowes

Hall

2014

IIEEE Trans. Software Eng.

305

191

View full text Add to dashboard Cite

show abstract

“…However, there are still large discrepancies regarding the assessment of the goodness of the different techniques and the reasons for such discrepancies [44,60,39]. For example, Lessmann et al [33] compare 22 classifiers grouped into statistical, nearest neighbour methods, neural networks, support vector machine, decision trees and ensemble methods over ten datasets from the NASA repository.…”

Section: Related Workmentioning

confidence: 99%

Searching for rules to detect defective modules: A subgroup discovery approach

Rodríguez

Ruíz

Riquelme

et al. 2012

Information Sciences

View full text Add to dashboard Cite

Data mining methods in software engineering are becoming increasingly important as they can support several aspects of the software development life-cycle such as quality. In this work, we present a data mining approach to induce rules extracted from static software metrics characterising fault-prone modules. Due to the special characteristics of the defect prediction data (imbalanced, inconsistency, redundancy) not all classification algorithms are capable of dealing with this task conveniently. To deal with these problems, Subgroup Discovery (SD) algorithms can be used to find groups of statistically different data given a property of interest. We propose EDER-SD (Evolutionary Decision Rules for Subgroup Discovery), a SD algorithm based on evolutionary computation that induces rules describing only faultprone modules. The rules are a well-known model representation that can be easily understood and applied by project managers and quality engineers. Thus, rules can help them to develop software systems that can be justifiably trusted. Contrary to other approaches in SD, our algorithm has the advantage of working with continuous variables as the conditions of the rules are defined using intervals. We describe the rules obtained by applying our algorithm to seven publicly available datasets from the PROMISE repository * Corresponding author Email addresses: daniel.rodriguezg@uah.es (D. Rodríguez), robertoruiz@upo.es (R. Ruiz), riquelme@lsi.us.es (J.C. Riquelme), aguilar@upo.es (J.S. Aguilar-Ruiz) February 4, 2011 showing that they are capable of characterising subgroups of fault-prone modules. We also compare our results with three other well known SD algorithms and the EDER-SD algorithm performs well in most cases. Preprint submitted to Information Sciences

show abstract

“…The result can be multiplied by 100 to get the percentage of deviation from the actual value. The M M RE is the mean of the M RE, it is one of the most widely used criterion for assessing the performance of software prediction models [30,31]. Table 8 shows the values of MRE values in the data set.…”

Section: Model Validationmentioning

confidence: 99%

Exploring the Relationship between UML Design Metrics for Web Applications and Maintainability.

Ghosheh¹,

Black²,

Kapetanios³

et al. 2010

JOT

View full text Add to dashboard Cite

The size and complexity of web applications is increasing at an extremely rapid rate. Many web applications have evolved from simple HTML pages to complex serviceoriented applications that have high maintenance costs. UML web design metrics are used to gauge whether the maintainability cost of the system can be controlled by correlating the UML design metrics to different measures of maintainability. This research empirically explores the relationships between existing UML design metrics based on Conallen's extension for web applications and maintenance effort. This research is evaluated, through an empirical case study of an industrial web application from the telecommunications domain.

show abstract

Reliability and validity in comparative studies of software prediction models

Cited by 207 publications

References 37 publications

Researcher Bias: The Use of Machine Learning in Software Defect Prediction

Researcher Bias: The Use of Machine Learning in Software Defect Prediction

Searching for rules to detect defective modules: A subgroup discovery approach

Exploring the Relationship between UML Design Metrics for Web Applications and Maintainability.

Contact Info

Product

Resources

About