Is "better data" better than "better data miners"?

Agrawal, Amritanshu; Menzies, Tim

doi:10.1145/3180155.3180197

Cited by 134 publications

(58 citation statements)

References 81 publications

(69 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…EPV (Event-Per-Variables)-a measure to evaluate the risk of overfitting-is the ratio of events to the number of independent variables used to train a model. Formally, EPV = #events (e.g., #defective modules) #variables (1) where the event is the number of occurrences of the least frequently occurring class of the dependent variable (e.g., the numbers of defective modules), and the variables is the number of independent variables used to train the model (i.e., the number of software metrics) [85]. Recently, Tantithamthavorn et al [85] demonstrated that models that are trained using datasets where the EPV is low (i.e., too few events are available relative to the number of independent variables) are especially susceptible to overfitting (i.e., being fit too closely to the training data).…”

Section: Statistical Analysis Of the Experimental Settingsmentioning

confidence: 99%

The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

Tantithamthavorn

Hassan

Matsumoto

2020

IIEEE Trans. Software Eng.

199

142

View full text Add to dashboard Cite

Defect prediction models that are trained on class imbalanced datasets (i.e., the proportion of defective and clean modules is not equally represented) are highly susceptible to produce inaccurate prediction models. Prior research compares the impact of class rebalancing techniques on the performance of defect prediction models. Prior research efforts arrive at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of defect prediction models. In this paper, we investigate the impact of 4 popularly-used class rebalancing techniques on 10 commonly-used performance measures and the interpretation of defect prediction models. We also construct statistical models to better understand in which experimental design settings that class rebalancing techniques are beneficial for defect prediction models. Through a case study of 101 datasets that span across proprietary and open-source systems, we recommend that class rebalancing techniques are necessary when quality assurance teams wish to increase the completeness of identifying software defects (i.e., Recall). However, class rebalancing techniques should be avoided when interpreting defect prediction models. We also find that class rebalancing techniques do not impact the AUC measure. Hence, AUC should be used as a standard measure when comparing defect prediction models.

show abstract

Section: Statistical Analysis Of the Experimental Settingsmentioning

confidence: 99%

The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

Tantithamthavorn

Hassan

Matsumoto

2020

IIEEE Trans. Software Eng.

199

142

View full text Add to dashboard Cite

show abstract

“…To minimize the threats, we use the F1 measure, which has been widely used in current SDP studies and makes a good balance between the precision and recall measures. AUC measure has also been widely used in current studies for SDP since it is insensitive to class imbalanced data and does not depend on an arbitrarily selected threshold …”

Section: Experiments Design and Results Analysismentioning

confidence: 99%

“…DEO is an optimizer, which is especially suitable for functions that may not be smooth or linear. It has been widely used in previous studies for hyperparameter optimization . According to a previous study, DEO has more competitive advantages when compared with other metaheuristic search algorithms, such as particle swarm optimization or genetic algorithm.…”

Section: Our Proposed Mask Approachmentioning

confidence: 99%

See 1 more Smart Citation

Multitask defect prediction

Chen

Xia

et al. 2019

J Software Evolu Process

View full text Add to dashboard Cite

Within‐project defect prediction assumes that we have sufficient labeled data from the same project, while cross‐project defect prediction assumes that we have plenty of labeled data from source projects. However, in practice, we might only have limited labeled data from both the source and target projects in some scenarios. In this paper, we want to apply multitask learning to investigate such a new scenario. To our best knowledge, this problem (ie, both the source project and the target project have limited labeled data) has not been thoroughly investigated, and we are the first to propose a novel multitask defect prediction approach mask. mask consists of a differential evolution optimization phase and a multitask learning phase. The former phase aims to find optimal weights for shared and nonshared information in related projects (ie, the target project and its related source projects), while the latter phase builds prediction models for each project simultaneously. To verify the effectiveness of mask, we perform experimental studies on 18 real‐world software projects and compare our approach with four state‐of‐the‐art baseline approaches: single‐task learning (STL), simple combined learning (SCL), Peters filter, and Burak filter. Experimental results show that mask can achieve F1 of 0.397 and AUC of 0.608 on average with a few labeled data (ie, 10% of data). Across the 18 projects, mask can outperform baseline methods significantly in terms of F1 and AUC. Therefore, by utilizing the relatedness among multiple projects, mask can perform significantly better than the state‐of‐the‐art methods. The results confirm that mask is promising for software defect prediction when the source and target projects both have limited training data.

show abstract

“…For building the defect predictors in this study, we elected to use Simple Logistic, Naive Bayes, Expectation Maximization, Support Vector Machine. We chose these learners because past studies shows that, for defect prediction tasks, these four learners represents four different levels of performance among a bunch of different learners [3,21]. Thus they are selected as the state-of-the-art learns to be compared with FFTs on the defect prediction data.…”

Section: Learner Biasmentioning

confidence: 99%

Applications of psychological science for actionable analytics

Chen

Wei

Krishna

2018

Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of

Self Cite

View full text Add to dashboard Cite

Actionable analytics are those that humans can understand, and operationalize. What kind of data mining models generate such actionable analytics? According to psychological scientists, humans understand models that most match their own internal models, which they characterize as lists of "heuristic" (i.e., lists of very succinct rules). One such heuristic rule generator is the Fast-and-Frugal Trees (FFT) preferred by psychological scientists. Despite their successful use in many applied domains, FFTs have not been applied in software analytics. Accordingly, this paper assesses FFTs for software analytics.We find that FFTs are remarkably effective. Their models are very succinct (5 lines or less describing a binary decision tree). These succinct models outperform state-of-the-art defect prediction algorithms defined by Ghortra et al. at ICSE'15. Also, when we restrict training data to operational attributes (i.e., those attributes that are frequently changed by developers), FFTs perform much better than standard learners.Our conclusions are two-fold. Firstly, there is much that software analytics community could learn from psychological science. Secondly, proponents of complex methods should always baseline those methods against simpler alternatives. For example, FFTs could be used as a standard baseline learner against which other software analytics tools are compared.

show abstract

Is "better data" better than "better data miners"?

Cited by 134 publications

References 81 publications

The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

Multitask defect prediction

Applications of psychological science for actionable analytics

Contact Info

Product

Resources

About