MPT‐embedding: An unsupervised representation learning of code for software defect prediction

Shi, Ke; Liu, Guangliang; Wei, Zhenchun; Chang, Jingfei

doi:10.1002/smr.2330

Cited by 11 publications

(18 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• MPT [6]: This approach to defect prediction uses multiperspective tree embedding to learn the representation an AST in an unsupervised manner.…”

Section: Baseline Methodsmentioning

confidence: 99%

“…The data set used in this work consists of source code from seven Java Apache projects collected from PROMISE, 6 a publicly accessible repository of SDP research data collected by Jureczko and Madeyski [34]. Specifically, each project version from PROMISE is represented by a list of classes it consists of, and each class is described by 20 traditional code features, such as lines of code, and the defect label.…”

Section: A Data Setmentioning

confidence: 99%

“…Although there are some approaches for predicting defective software modules that encode the structural information of ASTs into features for describing modules, shortcomings can be observed in these approaches. In particular, such approaches compute features either on a bottom-up basis [8], using specially designed relationships between AST's nodes [6], or focuses only on a part of an AST [12]. The bottom-up method used by Dam et al [8] computes the features of AST from leaf nodes to root nodes, which makes it difficult to capture the long-range dependencies between distant nodes.…”

mentioning

confidence: 99%

“…The bottom-up method used by Dam et al [8] computes the features of AST from leaf nodes to root nodes, which makes it difficult to capture the long-range dependencies between distant nodes. Furthermore, it is not entirely clear why it is necessary to define additional relationships in a tree besides edges, as done by Shi et al [6], since such relationships do not provide any new information about the tree itself or about the defectiveness of the source code. Moreover, using only a part of an AST to represent the whole software module, as in work by Xu et al [12], may be memory efficient, but it is indisputable that a valuable information about software module can be lost when neglecting the whole structure of the AST representing its source code.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Graph Neural Network for Source Code Defect Prediction

et al. 2022

View full text Add to dashboard Cite

Predicting defective software modules before testing is a useful operation that ensures that the time and cost of software testing can be reduced. In recent years, several models have been proposed for this purpose, most of which are built using deep learning-based methods. However, most of these models do not take full advantage of a source code as they ignore its tree structure or they focus only on a small part of a code. To investigate whether and to what extent information from this structure can be beneficial in predicting defective source code, we developed an end-to-end model based on a convolutional graph neural network (GCNN) for defect prediction, whose architecture can be adapted to the analyzed software, so that projects of different sizes can be processed with the same level of detail. The model processes the information of the nodes and edges from the abstract syntax tree (AST) of the source code of a software module and classifies the module as defective or not defective based on this information. Experiments on open source projects written in Java have shown that the proposed model performs significantly better than traditional defect prediction models in terms of AUC and F-score. Based on the F-scores of the existing state-of-the-art models, the model has shown comparable predictive capabilities for the analyzed projects.INDEX TERMS Software defect prediction, deep learning, graph neural network.

show abstract

“…• MPT [6]: This approach to defect prediction uses multiperspective tree embedding to learn the representation an AST in an unsupervised manner.…”

Section: Baseline Methodsmentioning

confidence: 99%

Section: A Data Setmentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Graph Neural Network for Source Code Defect Prediction

et al. 2022

View full text Add to dashboard Cite

show abstract

“…The proposed framework achieved a better F-measure by 0.532 on average, compared with the selected baselines. Besides, Shi et al (2021) represented the code by different representations. The most crucial information of nodes was coded.…”

Section: E Frameworkmentioning

confidence: 99%

A Systematic Literature Review of Software Defect Prediction Using Deep Learning

Fathy¹,

Abd-Elmegid²,

Bahaa³

et al. 2021

Journal of Computer Science

View full text Add to dashboard Cite

The approaches associated with software defect prediction are used to reduce the time and cost of discovering software defects in source code and to improve the software quality in the organizations. There are two approaches to reveal the software defects in the source code. The first approach is concentrated on the traditional features such as lines of code, code complexity, etc. However, these features fail to extract the semantics of the source code. The second one is concentrated on revealing these semantics. This paper presents a Systematic Literature Review (SLR) of software defect prediction using deep learning models. This SLR is focused on identifying the studies that use the semantics of the source code for improving defect prediction. This SLR aims to analyze the used datasets, models and frameworks. Also, identifying the evaluation metrics to ensure their applicability in software defect prediction. IEEE Xplore, Scopus and Web of Science digital libraries were used to select the suitable primary studies. Forty (40) primary studies were selected that published by 15 December 2020 for analysis based on the quality criteria. The project levels that applied in the studies were: Within-project 52.5%, cross-project 17.5% and both within-project and cross-project 30%. The datasets used were: Promise dataset 68.18% and other datasets 31.82%. The most used deep learning model in the primary studies was: Convolutional Neural Network (CNN) by 35%. The most used evaluation metrics were: F-measure and Area Under the Curve (AUC). Software defect prediction using deep learning models is still a valuable topic and requires much research studies to enhance the performance of the defect prediction.

show abstract

A novel defect prediction method based on semantic feature enhancement

Zhang,

Wang,

Chen

et al. 2024

J Software Evolu Process

View full text Add to dashboard Cite

SummaryAlthough cross‐project defect prediction (CPDP) techniques that use traditional manual features to build defect prediction model have been well‐developed, they usually ignore the semantic and structural information inside the program and fail to capture the hidden features that are critical for program category prediction, resulting in poor defect prediction results. Researchers have proposed using deep learning to automatically extract the semantic features of programs and fuse them with traditional features as training data. However, in practice, it is important to explore the effective representation of the semantic features in the programs and how the fusion of a reasonable ratio between the two types of features can maximize the effectiveness of the model. In this paper, we propose a semantic feature enhancement‐based defect prediction framework (SFE‐DP), which augments the semantic feature set extracted from the program code with data. We also introduce a layer of self‐attentive mechanism and a matching layer to filter low‐efficiency and non‐critical semantic features in the model structure. Finally, we combine the idea of hybrid loss function to iteratively optimize the model parameters. Extensive experiments validate that SFE‐DP can outperform the baseline approaches on 90 pairs of CPDP tasks formed by 10 open‐source projects.

show abstract

MPT‐embedding: An unsupervised representation learning of code for software defect prediction

Cited by 11 publications

References 48 publications

Graph Neural Network for Source Code Defect Prediction

Graph Neural Network for Source Code Defect Prediction

A Systematic Literature Review of Software Defect Prediction Using Deep Learning

A novel defect prediction method based on semantic feature enhancement

Contact Info

Product

Resources

About