Heterogeneous Defect Prediction via Exploiting Correlation Subspace

Cheng, Ming; Wu, Guo‐Rong; Jiang, Min; Wan, Hongyan; You, Guoan; Yuan, Mengting

doi:10.18293/seke2016-090

Cited by 23 publications

(23 citation statements)

References 22 publications

(35 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To be fair, we choose LR classifier for all compared methods except for CCT‐SVM, which uses the default SVM classifier . Comparison of CLSUP and CCT‐SVM using the SVM classifier will be reported in Section 4.5.4.…”

Section: Methodsmentioning

confidence: 99%

“…Recently, heterogeneous fault prediction (HFP) models are presented to predict faults across projects with heterogeneous metric sets, ie, source and target projects have different metric sets. For example, Jing et al presented a transfer CCA (canonical correlation analysis)+ method by using unified metric representation (UMR) and CCA‐based transfer learning technique for HFP.…”

Section: Introductionmentioning

confidence: 99%

“…Although existing HFP methods have achieved promising results, the following issues still have not been well studied. (1)Mixed project data problem : In the early phases of software testing, projects may have only a small amount of historical defect data (training target data).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Heterogeneous fault prediction with cost‐sensitive domain adaptation

Jing

Zhu

2018

Software Testing Verif & Rel

View full text Add to dashboard Cite

Summary In the early phases of software testing, projects may have only limited historical defect data. Learning prediction model with such insufficient training data will limit the efficacy of learned predictor. In practice, there are usually many publicly available fault prediction datasets. Recently, heterogeneous fault prediction (HFP) has been proposed. However, existing HFP models do not investigate how to use mixed project data to predict target. Furthermore, defect data are often imbalanced. The imbalanced data distribution of source usually leads to serious misclassification of fault‐prone instances, which will degrade the predictor's performance. Existing HFP methods do not consider the class imbalance problem in the training stages. In this paper, we propose a novel Cost‐sensitive Label and Structure‐consistent Unilateral Projection (CLSUP) approach for HFP. CLSUP can not only make better use of the within‐project and cross‐project data but also alleviate the class imbalance problem by setting different misclassification costs for fault‐prone and non–fault‐prone instances. Extensive experiments on 30 projects demonstrate the effectiveness of CLSUP.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Heterogeneous fault prediction with cost‐sensitive domain adaptation

Jing

Zhu

2018

Software Testing Verif & Rel

View full text Add to dashboard Cite

show abstract

“…The second mainstream way is to design effective defect predictor based on transfer learning techniques (e.g., [7,10,15,16,17,18,19]). For instance, Ma et al [15] proposed Transfer Naï ve Bayes (TNB) model.…”

Section: A Defect Predictionmentioning

confidence: 99%

“…Another challenge in CCDP is that the set of metrics between the source company data and target company data is usually heterogeneous. Jing et al [7] and Chen et al [19] proposed the effective solutions for heterogeneous cross-company defect prediction.…”

Section: A Defect Predictionmentioning

confidence: 99%

Combing Data Filter and Data Sampling for Cross-Company Defect Prediction: An Empricial Study

Zhang³

et al. 2017

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Abstract-Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a prediction model with high performance. On the other hand, the CC data has the highly imbalanced nature between the defectiveprone and non-defective classes, which will degrade the performance of CCDP. To address such issues, this paper proposes an approach, in which data sampling is combined with data filter, to overcome these problems. Data sampling seeks a more balanced dataset through the addition or removal of instances, while data filter is a process of filtering out the irrelevant CC data so that the performance of CCDP models can be improved. We employ two data filtering methods called NN filter and DBSCAN filter combined with SMOTE (Synthetic Minority Oversampling Technique) and RUS (Random UnderSampling). Eight different approaches would be produced when combing these four techniques: 1-NN filter performed prior to RUS; 2-NN filter performed after RUS; 3-NN filter performed prior to SMOTE; 4-NN filter performed after SMOTE; 5-DBSCAN filter performed prior to RUS; 6-DBSCAN filter performed after RUS; 7-DBSCAN filter performed prior to SMOTE; 8-DBSCAN filter performed after SMOTE. The empirical study was carried out on 15 publicly available project datasets. The experimental results demonstrate that NN filter performed prior to RUS (Approach 1) performs better than the other seven approaches.

show abstract

Feature Representation and Feature Matching for Heterogeneous Defect Prediction

Mon

2019

Computer and Information Science

View full text Add to dashboard Cite

Heterogeneous Defect Prediction via Exploiting Correlation Subspace

Cited by 23 publications

References 22 publications

Heterogeneous fault prediction with cost‐sensitive domain adaptation

Heterogeneous fault prediction with cost‐sensitive domain adaptation

Combing Data Filter and Data Sampling for Cross-Company Defect Prediction: An Empricial Study

Feature Representation and Feature Matching for Heterogeneous Defect Prediction

Contact Info

Product

Resources

About