2021
DOI: 10.1155/2021/3135702
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study of Software Metrics Diversity for Cross-Project Defect Prediction

Abstract: Cross-project defect prediction (CPDP) is a mainstream method estimating the most defect-prone components of software with limited historical data. Several studies investigate how software metrics are used and how modeling techniques influence prediction performance. However, the software’s metrics diversity impact on the predictor remains unclear. Thus, this paper aims to assess the impact of various metric sets on CPDP and investigate the feasibility of CPDP with hybrid metrics. Based on four software metric… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 45 publications
0
3
0
Order By: Relevance
“…To effectively tackle CI issues inherent in the datasets, synthetic data is generated using the SDV, a proficient tool in developing generative models within relational databases. SDV facilitates data synthesis by selectively sampling across database components post-model formulation, ensuring adherence to underlying structural constraints [31]. Moreover, the study incorporates the utilization of five classification algorithms, namely DT, LR, KNN, NB, and RF, to conduct a comprehensive assessment of defect prediction effectiveness across multiple projects.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To effectively tackle CI issues inherent in the datasets, synthetic data is generated using the SDV, a proficient tool in developing generative models within relational databases. SDV facilitates data synthesis by selectively sampling across database components post-model formulation, ensuring adherence to underlying structural constraints [31]. Moreover, the study incorporates the utilization of five classification algorithms, namely DT, LR, KNN, NB, and RF, to conduct a comprehensive assessment of defect prediction effectiveness across multiple projects.…”
Section: Methodsmentioning
confidence: 99%
“…Within the software defect dataset, most of the data exhibits a significantly larger proportion of non-defective samples compared to defective ones [31]. CI often results in bias within machine learning models towards the majority class [35].…”
Section: Oversampling With Synthetic Data Vaultmentioning
confidence: 99%
“…The proposed approach focused on redundancy and extracting the parameters by using principal component analysis (PCA). Yiwen Zhong et al, [17] CPDP-orientobject, semantic, and structural metrics (OSS)…”
Section: International Journal On Recent and Innovation Trends In Com...mentioning
confidence: 99%