Proceedings of the 40th International Conference on Software Engineering 2018
DOI: 10.1145/3180155.3180197
|View full text |Cite
|
Sign up to set email alerts
|

Is "better data" better than "better data miners"?

Abstract: We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-performance criteria while (b) fixing the weaker regions of the training data (using SMOTUNED, which is an autotuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions when a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
49
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 134 publications
(58 citation statements)
references
References 81 publications
(69 reference statements)
2
49
0
Order By: Relevance
“…EPV (Event-Per-Variables)-a measure to evaluate the risk of overfitting-is the ratio of events to the number of independent variables used to train a model. Formally, EPV = #events (e.g., #defective modules) #variables (1) where the event is the number of occurrences of the least frequently occurring class of the dependent variable (e.g., the numbers of defective modules), and the variables is the number of independent variables used to train the model (i.e., the number of software metrics) [85]. Recently, Tantithamthavorn et al [85] demonstrated that models that are trained using datasets where the EPV is low (i.e., too few events are available relative to the number of independent variables) are especially susceptible to overfitting (i.e., being fit too closely to the training data).…”
Section: Statistical Analysis Of the Experimental Settingsmentioning
confidence: 99%
“…EPV (Event-Per-Variables)-a measure to evaluate the risk of overfitting-is the ratio of events to the number of independent variables used to train a model. Formally, EPV = #events (e.g., #defective modules) #variables (1) where the event is the number of occurrences of the least frequently occurring class of the dependent variable (e.g., the numbers of defective modules), and the variables is the number of independent variables used to train the model (i.e., the number of software metrics) [85]. Recently, Tantithamthavorn et al [85] demonstrated that models that are trained using datasets where the EPV is low (i.e., too few events are available relative to the number of independent variables) are especially susceptible to overfitting (i.e., being fit too closely to the training data).…”
Section: Statistical Analysis Of the Experimental Settingsmentioning
confidence: 99%
“…To minimize the threats, we use the F1 measure, which has been widely used in current SDP studies and makes a good balance between the precision and recall measures. AUC measure has also been widely used in current studies for SDP since it is insensitive to class imbalanced data and does not depend on an arbitrarily selected threshold …”
Section: Experiments Design and Results Analysismentioning
confidence: 99%
“…DEO is an optimizer, which is especially suitable for functions that may not be smooth or linear. It has been widely used in previous studies for hyperparameter optimization . According to a previous study, DEO has more competitive advantages when compared with other metaheuristic search algorithms, such as particle swarm optimization or genetic algorithm.…”
Section: Our Proposed Mask Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…For building the defect predictors in this study, we elected to use Simple Logistic, Naive Bayes, Expectation Maximization, Support Vector Machine. We chose these learners because past studies shows that, for defect prediction tasks, these four learners represents four different levels of performance among a bunch of different learners [3,21]. Thus they are selected as the state-of-the-art learns to be compared with FFTs on the defect prediction data.…”
Section: Learner Biasmentioning
confidence: 99%