Convolutional neural networks on assembly code for predicting software defects

Phan, Anh Viet; Nguyen, Minh Le

doi:10.1109/iesys.2017.8233558

Cited by 29 publications

(20 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, their results were not as good as results for Li's model [24]. There is also research on deep defect prediction targeting assembly code [54,55], both of which leveraged a CNN model to learn from assembly instructions.…”

Section: Defect Prediction Based On Deep Featuresmentioning

confidence: 95%

An Improved CNN Model for Within-Project Software Defect Prediction

Pan

et al. 2019

Applied Sciences

View full text Add to dashboard Cite

To improve software reliability, software defect prediction is used to find software bugs and prioritize testing efforts. Recently, some researchers introduced deep learning models, such as the deep belief network (DBN) and the state-of-the-art convolutional neural network (CNN), and used automatically generated features extracted from abstract syntax trees (ASTs) and deep learning models to improve defect prediction performance. However, the research on the CNN model failed to reveal clear conclusions due to its limited dataset size, insufficiently repeated experiments, and outdated baseline selection. To solve these problems, we built the PROMISE Source Code (PSC) dataset to enlarge the original dataset in the CNN research, which we named the Simplified PROMISE Source Code (SPSC) dataset. Then, we proposed an improved CNN model for within-project defect prediction (WPDP) and compared our results to existing CNN results and an empirical study. Our experiment was based on a 30-repetition holdout validation and a 10 * 10 cross-validation. Experimental results showed that our improved CNN model was comparable to the existing CNN model, and it outperformed the state-of-the-art machine learning models significantly for WPDP. Furthermore, we defined hyperparameter instability and examined the threat and opportunity it presents for deep learning models on defect prediction.

show abstract

Section: Defect Prediction Based On Deep Featuresmentioning

confidence: 95%

An Improved CNN Model for Within-Project Software Defect Prediction

Pan

et al. 2019

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Guo et al [31] presented a solution that utilizes recurrent neural network models to perform software traceability. In addition, deep learning models have also been used in vulnerability detection [25], [26], bug localization [23], [24], defect prediction on assembly code [54], [55], etc.…”

Section: B Deep Learning and Software Engineeringmentioning

confidence: 99%

Seml: A Semantic LSTM Model for Software Defect Prediction

Liu

Jiang

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Software defect prediction can assist developers in finding potential bugs and reducing maintenance cost. Traditional approaches usually utilize software metrics (Lines of Code, Cyclomatic Complexity, etc.) as features to build classifiers and identify defective software modules. However, software metrics often fail to capture programs' syntax and semantic information. In this paper, we propose Seml, a novel framework that combines word embedding and deep learning methods for defect prediction. Specifically, for each program source file, we first extract a token sequence from its abstract syntax tree. Then, we map each token in the sequence to a real-valued vector using a mapping table, which is trained with an unsupervised word embedding model. Finally, we use the vector sequences and their labels (defective or non-defective) to build a Long Short Term Memory (LSTM) network. The LSTM model can automatically learn the semantic information of programs and perform defect prediction. The evaluation results on eight open source projects show that Seml outperforms three state-of-the-art defect prediction approaches on most of the datasets for both within-project defect prediction and cross-project defect prediction.INDEX TERMS Defect prediction, Long Short Term Memory Network, word embedding.

show abstract

“…Each nonleaf node holds a vector θ that has the same dimension as the word vector. After providing the central word w context projection x w as input, the formula for predicting the conditional probability of the word w is show in (1) and (2).…”

Section: Hierarchical Softmaxmentioning

confidence: 99%

“…We used the open-source Python project called javalang [29] as the tool to parse the Java code in the PROMISE library into the AST. Following Phan et al 's research [2], we pick only three types of AST nodes as tokens: The first type is nodes associated with class instantiation and method invocation; we use their method name or class name as token. The second type is declare nodes, such as method declarations, type declarations, interface declarations and enumeration declarations.…”

Section: B Parsing Source Code and Select Ast Node We Needmentioning

confidence: 99%

An Abstract Syntax Tree Encoding Method for Cross-Project Defect Prediction

Cai

Qiu

2019

IEEE Access

View full text Add to dashboard Cite

In the last few years, with the development of deep learning theory, researchers have tried to introduce the method of artificial intelligence into the field of software defect prediction (SDP) to improve its prediction effect. To be fed into the neural network, the sample codes are represented as an abstract syntax tree (AST), and the AST is encoded as real numbers. However, in most cross-project defect prediction (CPDP) task, the method for converting the AST into a real number cannot effectively estimate the semantic distance between the ASTs, resulting in a significant reduction in training effects. To solve that problem, we present a new encoding framework, tree-based-embedding (TBE), to convert AST into real vectors and make the semantic gap between the ASTs measurable. To estimate the effect of this encoding method, we promise a tree-based-embedding convolutional neural network with transferable hybrid feature learning (TBCNN-THFL) to perform the CPDP tasks. TBCNN-THFL is fed data encoded with TBE method for learning the transferable joint features between different projects; meanwhile, TBCNN-THFL introduces a transfer component analysis algorithm. Furthermore, the model combines the handcrafted and deep-learninggenerated features and then feeds them into the classifier to train a defect prediction model. A sufficient number of experiments demonstrate that TBCNN-THFL is superior to referential models on 72 pairs of CPDP tasks formed by 9 open-source projects. INDEX TERMS Software engineering, software defect prediction, cross project defect prediction, deep learning, continuous bag-of-word, transfer component algorithm.

show abstract

Convolutional neural networks on assembly code for predicting software defects

Cited by 29 publications

References 17 publications

An Improved CNN Model for Within-Project Software Defect Prediction

An Improved CNN Model for Within-Project Software Defect Prediction

Seml: A Semantic LSTM Model for Software Defect Prediction

An Abstract Syntax Tree Encoding Method for Cross-Project Defect Prediction

Contact Info

Product

Resources

About