The Impact of Using Regression Models to Build Defect Classifiers

Rajbahadur, Gopi Krishnan; Kamei, Yasutaka; Hassan, Ahmed E.

doi:10.1109/msr.2017.4

Cited by 53 publications

(24 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The rationale behind the usage of this test was that the Scott-Knott ESD can be adopted to control dataset-specific performances: indeed, it evaluates the performances of the different prediction models on each dataset in isolation, thus ranking the top models based on their performances on each project. For this reason, we had 34 different Scott-Knott ranks that we analyzed by measuring the likelihood of a model to be in the top Scott-Knott ESD rank, as done in previous work [90], [118], [119].…”

Section: Rq1 -The Contribution Of the Intensity Indexmentioning

confidence: 99%

Toward a Smell-Aware Bug Prediction Model

Palomba

Zanoni

Fontana

et al. 2019

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Code smells are symptoms of poor design and implementation choices. Previous studies empirically assessed the impact of smells on code quality and clearly indicate their negative impact on maintainability, including a higher bug-proneness of components affected by code smells. In this paper, we capture previous findings on bug-proneness to build a specialized bug prediction model for smelly classes. Specifically, we evaluate the contribution of a measure of the severity of code smells (i.e., code smell intensity) by adding it to existing bug prediction models based on both product and process metrics, and comparing the results of the new model against the baseline models. Results indicate that the accuracy of a bug prediction model increases by adding the code smell intensity as predictor. We also compare the results achieved by the proposed model with the ones of an alternative technique which considers metrics about the history of code smells in files, finding that our model works generally better. However, we observed interesting complementarities between the set of buggy and smelly classes correctly classified by the two models. By evaluating the actual information gain provided by the intensity index with respect to the other metrics in the model, we found that the intensity index is a relevant feature for both product and process metrics-based models. At the same time, the metric counting the average number of code smells in previous versions of a class considered by the alternative model is also able to reduce the entropy of the model. On the basis of this result, we devise and evaluate a smell-aware combined bug prediction model that included product, process, and smell-related features.We demonstrate how such model classifies bug-prone code components with an F-Measure at least 13% higher than the existing state-of-the-art models.

show abstract

Section: Rq1 -The Contribution Of the Intensity Indexmentioning

confidence: 99%

Toward a Smell-Aware Bug Prediction Model

Palomba

Zanoni

Fontana

et al. 2019

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“…Since there are many analytical learners that can be used to investigate the impact of correlated metrics on defect models, the aforementioned surveys guide our selection of the two commonly-used analytical learners: logistic regression [5,6,15,43,53,57,58,65,87] and random forest [23,24,38,55,64]. These techniques are two of the most commonly-used analytical learners for defect models and they have built-in techniques for model interpretation Figure 2: An overview diagram of the design of our case study.…”

Section: Techniques For Mitigating Correlated Metricsmentioning

confidence: 99%

“…Plenty of prior studies investigate the impact of many phenomena on code quality using software metrics, for example, code size, code complexity [31,49,71], change complexity [42,57,59,71,88], antipatterns [41], developer activity [71], developer experience [61], developer expertise [5], developer and reviewer knowledge [81], design [3,10,11,14,16], reviewer participation [50,82], code smells [40], and mutation testing [7]. To perform such studies, there are five common steps: (1) formulating of hypotheses that pertain to the phenomena that one wishes to study; (2) designing appropriate metrics to operationalize the intention behind the phenomena under study; (3) defining a model specification (e.g., the ordering of metrics) to be used when constructing an analytical model; (4) constructing an analytical model using, for example, regression models [5,57,81,82,87] or random forest models [23,38,55,64]; and (5) examining the ranking of metrics using a model interpretation technique (e.g., ANOVA Type-I, one of the most commonly-used interpretation techniques since it is the default built-in function for logistic regression (glm) models in R) in order to test the hypotheses.…”

Section: Introductionmentioning

confidence: 99%

The Impact of Correlated Metrics on the Interpretation of Defect Models

Jiarpakdee

Tantithamthavorn

Hassan

2021

IIEEE Trans. Software Eng.

Self Cite

View full text Add to dashboard Cite

Defect models are analytical models for building empirical theories related to software quality. Prior studies often derive knowledge from such models using interpretation techniques, e.g., ANOVA Type-I. Recent work raises concerns that correlated metrics may impact the interpretation of defect models. Yet, the impact of correlated metrics in such models has not been investigated. In this paper, we investigate the impact of correlated metrics on the interpretation of defect models and the improvement of the interpretation of defect models when removing correlated metrics. Through a case study of 14 publicly-available defect datasets, we find that (1) correlated metrics have the largest impact on the consistency, the level of discrepancy, and the direction of the ranking of metrics, especially for ANOVA techniques. On the other hand, we find that removing all correlated metrics (2) improves the consistency of the produced rankings regardless of the ordering of metrics (except for ANOVA Type-I); (3) improves the consistency of ranking of metrics among the studied interpretation techniques; (4) impacts the model performance by less than 5 percentage points. Thus, when one wishes to derive sound interpretation from defect models, one must (1) mitigate correlated metrics especially for ANOVA analyses; and (2) avoid using ANOVA Type-I even if all correlated metrics are removed.

show abstract

“…In this work, we use PCA to best account for multicollinearity. Software metrics can be highly correlated to each other (Rajbahadur et al 2017) and highly correlated metrics (i.e., |ρ| > 0.7) can lead to an inflated variance in the estimation of the outcome.…”

Section: Varimax Transformationmentioning

confidence: 99%

Slice-Based Cognitive Complexity Metrics for Defect Prediction

Alqadi

Maletic

2020

2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)

View full text Add to dashboard Cite

I express my deepest gratefulness toward my wonderful parents, Suliman and Madawi, who did more than their best to raise, educate and support me. Thank you for keeping me in your thoughts and prayers. None of this would ever have been possible without the enormous support, and patience from the closest soul, my beloved husband, Khaled. Thanks to the sweetest kids, sliver of my heart, eyes of glory, in which I see the world through, Mohammed, Suliman, Saud and Lena. My sincere appreciation goes to my sisters and brothers. I am truly lucky and blessed to have them in my life. I would like to extend my grateful thanks to my colleagues and friends in software engineering development laboratory , Computer Science Department, and Kent State University, who helped me in different ways to accomplish my research and dissertation. I wish them luck in their careers. Finally, I greatly thank my dissertation committee for their appreciated services, efforts, and precious time.

show abstract

The Impact of Using Regression Models to Build Defect Classifiers

Cited by 53 publications

References 31 publications

Toward a Smell-Aware Bug Prediction Model

Toward a Smell-Aware Bug Prediction Model

The Impact of Correlated Metrics on the Interpretation of Defect Models

Slice-Based Cognitive Complexity Metrics for Defect Prediction

Contact Info

Product

Resources

About