An Empirical Study on the Effectiveness of Feature Selection for Cross-Project Defect Prediction

Yu, Qiao; Qian, Junyan; Jiang, Shujuan; Wu, Zhenhua; Zhang, Gongjie

doi:10.1109/access.2019.2895614

Cited by 38 publications

(20 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For traditional module-level defect prediction, many studies use feature selection to improve prediction performance [66], [67]. Laradji et al [68] carefully combined ensemble learning with efficient feature selection to address the issues of correlation, feature irrelevance, and so on.…”

Section: B Feature Selection Methodsmentioning

confidence: 99%

Automatic Feature Exploration and an Application in Defect Prediction

Qiu

Liu

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Many software engineering tasks heavily rely on hand-crafted software features, e.g., defect prediction, vulnerability discovery, software requirements, code review, and malware detection. Previous solutions to these tasks usually directly use the hand-crafted features or feature selection techniques for classification or regression, which usually leads to suboptimal results due to their lack of powerful representations of the hand-crafted features. To address the above problem, in this paper, we adopt the effortaware just-in-time software defect prediction (JIT-SDP), which is a typical hand-crafted-feature-based task, as an example, to exploit new possible solutions. We propose a new model, named neural forest (NF), which uses the deep neural network and decision forest to build a holistic system for the automatic exploration of powerful feature representations that are used for the following classification. NF first employs a deep neural network to learn new feature representations from hand-crafted features. Then, a decision forest is connected after the neural network to perform classification and in the meantime, to guide the learning of feature representation. NF mainly aims at solving the challenging problem of combining the two different worlds of neural networks and decision forests in an end-to-end manner. When compared with previous state-of-the-art defect predictors and five designed baselines on six well-known benchmarks for within-and cross-project defect prediction, NF achieves significantly better results. The proposed NF model is generic to the classification problems which rely on the hand-crafted features.INDEX TERMS Feature exploration, hand-crafted features, defect prediction.

show abstract

Section: B Feature Selection Methodsmentioning

confidence: 99%

Automatic Feature Exploration and an Application in Defect Prediction

Qiu

Liu

et al. 2019

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Frequency References Data Normalization 11 [20], [55], [56], [72], [76], [79], [85], [88], [92], [104] [57], [80], [84], [87], [89] Data Normalization, and Feature Selection 4 [90], [91], [116], [117] Data Normalization, and Data Filtering 4 [58], [75], [82], [93] Data Imbalance, and Data Filtering 1 [94] Data Filtering, and Feature Selection 3 [31], [97], [100] Data Imbalance, and Feature Selection 1 [40] Data Normalization, Data Imbalance, and Data Filtering 5 [77], [78], [81], [83], [86] Data Normalization, Data Imbalance, and Feature Selection • deep belief network based on abstract syntax tree [108], [113] • correlation-based feature selection for feature subset selection [100], [111] • improved subclass discriminant analysis [61] • information flow algorithm [97] • feature selection using clusters of hybrid-data approach [59] • top-k feature subset based on number of occurrences of different metrics [109] • geodesic flow kernel feature selection [110] • similarity measure …”

Section: Techniquesmentioning

confidence: 99%

Cross-Project Defect Prediction: A Literature Review

Pal

Sillitti²

2022

IEEE Access

View full text Add to dashboard Cite

Background: Software defect prediction models aim at identifying the potential faulty modules of a software project based on historical data collected from previous versions of the same project. Due to the lack of availability of software engineering data from the same project, the researchers proposed crossproject defect prediction (CPDP) models where the data collected from one or more projects are used to predict faults in other project. There are a number of approaches proposed with different levels of success and very limited repeatability. Goals: The purpose of this paper is to investigate the existing studies of cross-project models for defect prediction. It synthesizes the literature focusing on characteristics such as project type, software metrics, data preprocessing techniques, features selection approaches, classifiers, and performance measures used. Method: This paper follows the well-known Systematic Literature Review (SLR) approach proposed by Barbara Kitchenham in 2007. Results: Our finding shows that most of the article was published between 2015 and 2021. Moreover, the studies are mostly based on open-source datasets and the software metrics used to create the models are mainly product metrics. We also found out that most studies attempted to improve their models improving data preprocessing and feature selection approaches. Furthermore, logistic regression followed by naive bayes and random forest are the most adopted classifier techniques in such models. Finally, the f-measure followed by recall and AUC are the most preferred evaluation measure used to evaluate the performance of the models. Conclusions: This study provides an overview of the different approaches used to improve the CPDP models analyzing the different techniques used for data preprocessing, feature selection, and the selection of the classifiers. Moreover, we identified some aspects that need further investigation.

show abstract

“…Saidi et al [17] present a feature selection method that incorporates the genetic algorithm (GA) and the Pearson correlation coefficient (PCC). Yu et al [18] examine the effectiveness of feature selection in CPDP using feature subset selection and feature ranking approaches. In contrast to conventional feature selection methods that only focus on finding a single discriminating feature, Mao and Yang [19] present a multilayer feature subset selection method that uses randomized searches and multilayer structures to select discriminative subsets.…”

Section: Literature Surveymentioning

confidence: 99%

BFEDroid: A Feature Selection Technique to Detect Malware in Android Apps Using Machine Learning

Chimeleze

Jamil

Ismail

et al. 2022

Security and Communication Networks

View full text Add to dashboard Cite

Malware detection refers to the process of detecting the presence of malware on a host system, or that of determining whether a specific program is malicious or benign. Machine learning-based solutions first gather information from applications and then use machine learning algorithms to develop a classifier that can distinguish between malicious and benign applications. Researchers and practitioners have long paid close attention to the issue. Most previous work has addressed the differences in feature importance or the computation of feature weights, which is unrelated to the classification model used, and therefore, the implementation of a selection approach with limited feature hiccups, and increases the execution time and memory usage. BFEDroid is a machine learning detection strategy that combines backward, forward, and exhaustive subset selection. This proposed malware detection technique can be updated by retraining new applications with true labels. It has higher accuracy (99%), lower memory consumption (1680), and a shorter execution time (1.264SI) than current malware detection methods that use feature selection.

show abstract

An Empirical Study on the Effectiveness of Feature Selection for Cross-Project Defect Prediction

Cited by 38 publications

References 35 publications

Automatic Feature Exploration and an Application in Defect Prediction

Automatic Feature Exploration and an Application in Defect Prediction

Cross-Project Defect Prediction: A Literature Review

BFEDroid: A Feature Selection Technique to Detect Malware in Android Apps Using Machine Learning

Contact Info

Product

Resources

About