2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) 2017
DOI: 10.1109/msr.2017.4
|View full text |Cite
|
Sign up to set email alerts
|

The Impact of Using Regression Models to Build Defect Classifiers

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 53 publications
(24 citation statements)
references
References 31 publications
0
24
0
Order By: Relevance
“…The rationale behind the usage of this test was that the Scott-Knott ESD can be adopted to control dataset-specific performances: indeed, it evaluates the performances of the different prediction models on each dataset in isolation, thus ranking the top models based on their performances on each project. For this reason, we had 34 different Scott-Knott ranks that we analyzed by measuring the likelihood of a model to be in the top Scott-Knott ESD rank, as done in previous work [90], [118], [119].…”
Section: Rq1 -The Contribution Of the Intensity Indexmentioning
confidence: 99%
“…The rationale behind the usage of this test was that the Scott-Knott ESD can be adopted to control dataset-specific performances: indeed, it evaluates the performances of the different prediction models on each dataset in isolation, thus ranking the top models based on their performances on each project. For this reason, we had 34 different Scott-Knott ranks that we analyzed by measuring the likelihood of a model to be in the top Scott-Knott ESD rank, as done in previous work [90], [118], [119].…”
Section: Rq1 -The Contribution Of the Intensity Indexmentioning
confidence: 99%
“…Since there are many analytical learners that can be used to investigate the impact of correlated metrics on defect models, the aforementioned surveys guide our selection of the two commonly-used analytical learners: logistic regression [5,6,15,43,53,57,58,65,87] and random forest [23,24,38,55,64]. These techniques are two of the most commonly-used analytical learners for defect models and they have built-in techniques for model interpretation Figure 2: An overview diagram of the design of our case study.…”
Section: Techniques For Mitigating Correlated Metricsmentioning
confidence: 99%
“…Plenty of prior studies investigate the impact of many phenomena on code quality using software metrics, for example, code size, code complexity [31,49,71], change complexity [42,57,59,71,88], antipatterns [41], developer activity [71], developer experience [61], developer expertise [5], developer and reviewer knowledge [81], design [3,10,11,14,16], reviewer participation [50,82], code smells [40], and mutation testing [7]. To perform such studies, there are five common steps: (1) formulating of hypotheses that pertain to the phenomena that one wishes to study; (2) designing appropriate metrics to operationalize the intention behind the phenomena under study; (3) defining a model specification (e.g., the ordering of metrics) to be used when constructing an analytical model; (4) constructing an analytical model using, for example, regression models [5,57,81,82,87] or random forest models [23,38,55,64]; and (5) examining the ranking of metrics using a model interpretation technique (e.g., ANOVA Type-I, one of the most commonly-used interpretation techniques since it is the default built-in function for logistic regression (glm) models in R) in order to test the hypotheses.…”
Section: Introductionmentioning
confidence: 99%
“…In this work, we use PCA to best account for multicollinearity. Software metrics can be highly correlated to each other (Rajbahadur et al 2017) and highly correlated metrics (i.e., |ρ| > 0.7) can lead to an inflated variance in the estimation of the outcome.…”
Section: Varimax Transformationmentioning
confidence: 99%