Jingfei Chang scite author profile

Software project defect prediction can help developers allocate debugging resources. Existing software defect prediction models are usually based on machine learning methods, especially deep learning. Deep learning‐based methods tend to build end‐to‐end models that directly use source code‐based abstract syntax trees (ASTs) as input. They do not pay enough attention to the front‐end data representation. In this paper, we propose a new framework to represent source code called multiperspective tree embedding (MPT‐embedding), which is an unsupervised representation learning method. MPT‐embedding parses the nodes of ASTs from multiple perspectives and encodes the structural information of a tree into a vector sequence. Experiments on both cross‐project defect prediction (CPDP) and within‐project defect prediction (WPDP) show that, on average, MPT‐embedding provides improvements over the state‐of‐the‐art method.

show abstract

PathPair2Vec: An AST path pair-based code representation method for defect prediction

Shi

Chang

Wei

2020

Journal of Computer Languages

View full text Add to dashboard Cite

Automatic channel pruning via clustering and swarm intelligence optimization for CNN

Chang

Xue

et al. 2022

Appl Intell

View full text Add to dashboard Cite

Convolutional Neural Networks-Based Locating Relevant Buggy Code Files for Bug Reports Affected by Data Imbalance

et al. 2019

View full text Add to dashboard Cite

Software bug localization is very important in software engineering, but it is also complicated and time consuming. To improve the efficiency of developers, researchers have developed various traditional bug localization and machine learning bug localization methods. In this paper, we propose a novel method that improves bug localization performance. First, surface lexical correlation matching between bug reports and source code files is used to obtain features by deep neural network. Second, to solve the lexical gap between bug reports and source code files, semantic correlation matching between them is used to obtain features based on word embedding and sentence embedding by deep neural network. Then, the joint features obtained by the surface lexical and semantic correlation matching are fused into a unified feature representation for bug reports and source code files. In addition, since our experimental datasets are imbalanced data, we use a focal loss function to solve the impact of data imbalance. Finally, our method obtains the relatively high bug localization performance compared to other classic methods.

show abstract

Mapping Bug Reports to Relevant Source Code Files Based on the Vector Space Model and Word Embedding

et al. 2019

View full text Add to dashboard Cite

Although software bug localization in software maintenance and evolution is cumbersome and time-consuming, it is also very important, especially for large-scale software projects. To lighten the workload of developers, researchers have developed various information retrieval (IR)-based bug localization models for automated software support. In this paper, we propose a new method that reduces the time required for bug localization. First, the surface lexical similarity between a bug report and source code file is calculated based on the vector space model. Second, to address the lexical gap between the programming language and natural language, the word vector is used to calculate the semantic similarity between the bug report and source code file. Then, we use surface lexical and semantic similarity to calculate the total similarity for detecting buggy source code files. Our experimental word vectors are derived from Skip-gram and GloVe model training. We select an optimal 100 dimensional word vector for bug localization by evaluating it on four open source software examples. Finally, our experimental results show that our method outperforms classical IR-based methods in locating relevant source code files based on several indicators.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jingfei Chang

MPT‐embedding: An unsupervised representation learning of code for software defect prediction

PathPair2Vec: An AST path pair-based code representation method for defect prediction

Automatic channel pruning via clustering and swarm intelligence optimization for CNN

Convolutional Neural Networks-Based Locating Relevant Buggy Code Files for Bug Reports Affected by Data Imbalance

Mapping Bug Reports to Relevant Source Code Files Based on the Vector Space Model and Word Embedding

Contact Info

Product

Resources

About