Deep Semantic Feature Learning for Software Defect Prediction

Wang, Song; Liu, Taiyue; Nam, Jaechang; Tan, Lin

doi:10.1109/tse.2018.2877612

Cited by 177 publications

(134 citation statements)

References 95 publications

(233 reference statements)

Supporting

Mentioning

133

Contrasting

Unclassified

Order By: Relevance

“…However, some of previous studies argued that these threshold‐dependent performance measures are problematic. For example, these measures depend on an arbitrarily selected threshold, and these measures are sensitive to class imbalanced problem existed in most of the gathered SDP datasets.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Do different cross‐project defect prediction methods identify the same defective modules?

Chen

Qu³

et al. 2019

J Software Evolu Process

View full text Add to dashboard Cite

Cross‐project defect prediction (CPDP) is needed when the target projects are new projects or the projects have less training data, since these projects do not have sufficient historical data to build high‐quality prediction models. The researchers have proposed many CPDP methods, and previous studies have conducted extensive comparisons on the performance of different CPDP methods. However, to the best of our knowledge, it remains unclear whether different CPDP methods can identify the same defective modules, and this issue has not been thoroughly explored. In this article, we select 12 state‐of‐the‐art CPDP methods, including eight supervised methods and four unsupervised methods. We first compare the performance of these methods in the same experiment settings on five widely used datasets (ie, NASA, SOFTLAB, PROMISE, AEEEM, and ReLink) and rank these methods via the Scott‐Knott test. Final results confirm the competitiveness of unsupervised methods. Then we perform diversity analysis on defective modules for these methods by using the McNemar test. Empirical results verify that different CPDP methods may lead to difference in the modules predicted as defective, especially when the comparison is performed between the supervised methods and unsupervised methods. Finally, we also find there exist a certain number of defective modules, which cannot be correctly identified by any of the CPDP methods or can be correctly identified by only one CPDP method. These findings can be utilized to design more effective methods to further improve the performance of CPDP.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Jing et al considered subclass discriminant analysis (SDA) method. Wang et al resorted to deep learning. They utilized deep belief network (DBN) to automatically learn semantic features from token vectors extracted from abstract syntax trees (ASTs) of program modules.…”

Section: Background and Related Workmentioning

confidence: 99%

Do different cross‐project defect prediction methods identify the same defective modules?

Chen

Qu³

et al. 2019

J Software Evolu Process

View full text Add to dashboard Cite

show abstract

“…-In this paper, we proposed a new CPDP approach called TCNN to obtain the transferable semantic (TCNN-generated) features for cross-project prediction. The key improvement is that, considering the data distribution divergence between projects, TCNN transforms CNN by imbedding the representations of project-specific data to an RKHS for distribution matching -Comprehensive experiments results showed that our TCNN can achieve better prediction performance over classic CPDP methods (e.g., NNFilter [13], data gravitation (DG) [14], TCA+ [10]) and state-of-the-art DL-based approaches (e.g., DBN [9], defect prediction through CNN (DPCNN) [8]) on 90 pairs of CPDP tasks formed by 10 open-source projects.…”

Section: Source Projectmentioning

confidence: 99%

“…In traditional CPDP methods, handcrafted features are commonly adopted to perform CPDP (e.g., Halstead features based on operators and operands [5], McCabe features based on dependencies [6], and CKfeatures based on the object-oriented concept [7]). In recent years, some researchers [8,9] suggested that the generic convolutional neural network (CNN) and deep belief network (DBN) models could extract semantic and structural features from project programs and applied them to perform SDP for better prediction performance. We call these features deep-learning-generated (DL-generated) features.…”

Section: Introductionmentioning

confidence: 99%

“…In a previous study [9], it was assumed that the semantic features extracted by DBN can capture the common characteristics related to defects in the source code, so the features extracted from the source project can be directly applied to the target project. However, due to different scales, functions, and coding rules of software, the data in different projects would show distribution divergence [10,11].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Transfer Convolutional Neural Network for Cross-Project Defect Prediction

Qiu

Deng

et al. 2019

Applied Sciences

View full text Add to dashboard Cite

Cross-project defect prediction (CPDP) is a practical solution that allows software defect prediction (SDP) to be used earlier in the software lifecycle. With the CPDP technique, the software defect predictor trained by labeled data of mature projects can be applied for the prediction task of a new project. Most previous CPDP approaches ignored the semantic information in the source code, and existing semantic-feature-based SDP methods do not take into account the data distribution divergence between projects. These limitations may weaken defect prediction performance. To solve these problems, we propose a novel approach, the transfer convolutional neural network (TCNN), to mine the transferable semantic (deep-learning (DL)-generated) features for CPDP tasks. Specifically, our approach first parses the source file into integer vectors as the network inputs. Next, to obtain the TCNN model, a matching layer is added into convolutional neural network where the hidden representations of the source and target project-specific data are embedded into a reproducing kernel Hilbert space for distribution matching. By simultaneously minimizing classification error and distribution divergence between projects, the constructed TCNN could extract the transferable DL-generated features. Finally, without losing the information contained in handcrafted features, we combine them with transferable DL-generated features to form the joint features for CPDP performing. Experiments based on 10 benchmark projects (with 90 pairs of CPDP tasks) showed that the proposed TCNN method is superior to the reference methods.

show abstract

Prediction of software defects using deep learning with improved cuckoo search algorithm

Badvath

Miriyala²,

Gunupudi³

et al. 2022

Concurrency and Computation

View full text Add to dashboard Cite

Summary The software project model needs a defect prediction model to find defect‐prone file software systems. The fault‐prone model prediction, predicting bugs, and bug removal can undertake the software industry to achieve software quality. Therefore, automatically forecasting the number of errors in software modules is important, and it may assist developers in allocating limited resources more efficiently. Several methods for detecting and repairing such flaws at a low cost have been offered. These approaches, on the other hand, need to be significantly improved in terms of performance. Hence in this article, we implement an ensemble technique for the software defect prediction and prediction of the software bug. Also, we proposed a hybrid technique to predict several defects in the software system. The proposed approach uses principle component analysis for feature extraction which is to improve further performance and control the optimization problem. Classifiers were applied to five PROMISE datasets to determine the greatest implemented classifier with respect to the prediction achievement measuring factor. Our proposed model yields greater results on solving defect prediction problems and showing enhancement toward the existing model.

show abstract

Deep Semantic Feature Learning for Software Defect Prediction

Cited by 177 publications

References 95 publications

Do different cross‐project defect prediction methods identify the same defective modules?

Do different cross‐project defect prediction methods identify the same defective modules?

Transfer Convolutional Neural Network for Cross-Project Defect Prediction

Prediction of software defects using deep learning with improved cuckoo search algorithm

Contact Info

Product

Resources

About