Lina Gong scite author profile

Abstract-Background: Cross-company defect prediction (CCDP) is a field of study where an organization lacking enough local data can use data from other organizations for building defect predictors. To support CCDP, data must be shared. Such shared data must be privatized, but that privatization could severely damage the utility of the data. Aim: To enable effective defect prediction from shared data while preserving privacy. Method: We explore privatization algorithms that maintain class boundaries in a dataset. CLIFF is an instance pruner that deletes irrelevant examples. MORPH is a data mutator that moves the data a random distance, taking care not to cross class boundaries. CLIFF+MORPH are tested in a CCDP study among 10 defect datasets from the PROMISE data repository. Results: We find: 1) The CLIFFed+MORPHed algorithms provide more privacy than the state-of-the-art privacy algorithms; 2) in terms of utility measured by defect prediction, we find that CLIFF+MORPH performs significantly better. Conclusions: For the OO defect data studied here, data can be privatized and shared without a significant degradation in utility. To the best of our knowledge, this is the first published result where privatization does not compromise defect prediction.

show abstract

Empirical Evaluation of the Impact of Class Overlap on Software Defect Prediction

Gong

Jiang

Wang

et al. 2019

View full text Add to dashboard Cite

A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction

Gong

Jiang

et al. 2020

IEEE Trans. Rel.

View full text Add to dashboard Cite

Conditional Domain Adversarial Adaptation for Heterogeneous Defect Prediction

Gong

Jiang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

Predicting Software Quality by Optimized BP Network Based on PSO

Li¹,

Kou²,

Gong³

2011

JCP

View full text Add to dashboard Cite

<p class="Abstract">The prediction model of software quality is the key technology in the software quality evaluation system, which can be used to evaluate software quality characteristics that users care about. Prediction models are often used to find the nonlinear relationship between metric data and quality factors. The paper predicted the relationship between metric data and quality factors with historical data by using the optimized BP network based on PSO. According to the algorithm, 28 groups of data are adopted in the experiment, and compared with the results by using BP network. Experiments show that the algorithm has a better performance than the BP network algorithm and perfectly solve the problem of slow convergence and easily getting into local minimum.</p>

show abstract

An improved transfer adaptive boosting approach for mixed‐project defect prediction

Gong

Jiang

2019

J Software Evolu Process

View full text Add to dashboard Cite

Software defect prediction (SDP) has been a very important research topic in software engineering, since it can provide high-quality results when given sufficient historical data of the project.Unfortunately, there are not abundant data to bulid the defect prediction model at the beginning of a project. For this scenario, one possible solution is to use data from other projects in the same company. However, using these data practically would get poor performance because of different distributional characteristics among projects. Also, software has more non-defective instances than defective instances that may cause a significant bias towards defective instances.Considering these two problems, we propose an improved transfer adaptive boosting (ITrAd-aBoost) approach for being given a small number of labeled data in the testing project. In our approach, ITrAdaBoost can not only employ the Matthews correlation coefficient (MCC) as the measure instead of accuracy rate but also use the asymmetric misclassification costs for non-defective and defective instances. Extensive experiments on 18 public projects from four datasets indicate that: (a) our approach significantly outperforms state-of-the-art cross-project defect prediction (CPDP) approaches, and (b) our approach can obtain comparable prediction performances in contrast with within project prediction results. Consequently, the proposed approach can build an effective prediction model with a small number of labeled instances for mixed-project defect prediction (MPDP).

show abstract

The impact of feature selection techniques on effort‐aware defect prediction: An empirical study

Keung

et al. 2023

IET Software

View full text Add to dashboard Cite

Effort‐Aware Defect Prediction (EADP) methods sort software modules based on the defect density and guide the testing team to inspect the modules with high defect density first. Previous studies indicated that some feature selection methods could improve the performance of Classification‐Based Defect Prediction (CBDP) models, and the Correlation‐based feature subset selection method with the Best First strategy (CorBF) performed the best. However, the practical benefits of feature selection methods on EADP performance are still unknown, and blindly employing the best‐performing CorBF method in CBDP to pre‐process the defect datasets may not improve the performance of EADP models but possibly result in performance degradation. To assess the impact of the feature selection techniques on EADP, a total of 24 feature selection methods with 10 classifiers embedded in a state‐of‐the‐art EADP model (CBS+) on the 41 PROMISE defect datasets were examined. We employ six evaluation metrics to assess the performance of EADP models comprehensively. The results show that (1) The impact of the feature selection methods varies in classifiers and datasets. (2) The four wrapper‐based feature subset selection methods with forwards search, that is, AdaBoost with Forwards Search, Deep Forest with Forwards Search, Random Forest with Forwards Search, and XGBoost with Forwards Search (XGBF) are better than other methods across the studied classifiers and the used datasets. And XGBF with XGBoost as the embedded classifier in CBS+ performs the best on the datasets. (3) The best‐performing CorBF method in CBDP does not perform well on the EADP task. (4) The selected features vary with different feature selection methods and different datasets, and the features noc (number of children), ic (inheritance coupling), cbo (coupling between object classes), and cbm (coupling between methods) are frequently selected by the four wrapper‐based feature subset selection methods with forwards search. (5) Using AdaBoost, deep forest, random forest, and XGBoost as the base classifiers embedded in CBS+ can achieve the best performance. In summary, we recommend the software testing team should employ XGBF with XGBoost as the embedded classifier in CBS+ to enhance the EADP performance.

show abstract

Unsupervised Deep Domain Adaptation for Heterogeneous Defect Prediction

Gong

Jiang

et al. 2019

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Heterogeneous defect prediction (HDP) is to detect the largest number of defective software modules in one project by using historical data collected from other projects with different metrics. However, these data can not be directly used because of different metrics set among projects. Meanwhile, software data have more non-defective instances than defective instances which may cause a significant bias towards defective instances. To completely solve these two restrictions, we propose unsupervised deep domain adaptation approach to build a HDP model. Specifically, we firstly map the data of source and target projects into a unified metric representation (UMR). Then, we design a simple neural network (SNN) model to deal with the heterogeneous and class-imbalanced problems in software defect prediction (SDP). In particular, our model introduces the Maximum Mean Discrepancy (MMD) as the distance between the source and target data to reduce the distribution mismatch, and use the cross-entropy loss function as the classification loss. Extensive experiments on 18 public projects from four datasets indicate that the proposed approach can build an effective prediction model for heterogeneous defect prediction (HDP) and outperforms the related competing approaches.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lina Gong

Balancing Privacy and Utility in Cross-Company Defect Prediction

Empirical Evaluation of the Impact of Class Overlap on Software Defect Prediction

A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction

Conditional Domain Adversarial Adaptation for Heterogeneous Defect Prediction

Predicting Software Quality by Optimized BP Network Based on PSO

An improved transfer adaptive boosting approach for mixed‐project defect prediction

The impact of feature selection techniques on effort‐aware defect prediction: An empirical study

Unsupervised Deep Domain Adaptation for Heterogeneous Defect Prediction

Contact Info

Product

Resources

About