An Empirical Study of Software Metrics Diversity for Cross-Project Defect Prediction

Zhong, Yiwen; Song, Kun Woen; Lv, ShengKai; He, Peng

doi:10.1155/2021/3135702

Cited by 6 publications

(3 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To effectively tackle CI issues inherent in the datasets, synthetic data is generated using the SDV, a proficient tool in developing generative models within relational databases. SDV facilitates data synthesis by selectively sampling across database components post-model formulation, ensuring adherence to underlying structural constraints [31]. Moreover, the study incorporates the utilization of five classification algorithms, namely DT, LR, KNN, NB, and RF, to conduct a comprehensive assessment of defect prediction effectiveness across multiple projects.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Impact of a Synthetic Data Vault for Imbalanced Class in Cross-Project Defect Prediction

Putri Nabella,

Rudy Herteno,

Setyo Wahyu Saputro

et al. 2024

j.electron.electromedical.eng.med.inform

View full text Add to dashboard Cite

Software Defect Prediction (SDP) is crucial for ensuring software quality. However, class imbalance (CI) poses a significant challenge in predictive modeling. This study delves into the effectiveness of the Synthetic Data Vault (SDV) in mitigating CI within Cross-Project Defect Prediction (CPDP). Methodologically, the study addresses CI across ReLink, MDP, and PROMISE datasets by leveraging SDV to augment minority classes. Classification utilizing Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF), also model performance is evaluated using AUC and t-Test. The results consistently show that SDV performs better than SMOTE and other techniques in various projects. This superiority is evident through statistically significant improvements. KNN dominance in average AUC results, with values 0.695, 0.704, and 0.750. On ReLink, KNN show 16.06% improvement over the imbalanced and 12.84% over SMOTE. Similarly, on MDP, KNN 20.71% improvement over the imbalanced and a 10.16% over SMOTE. Moreover, on PROMISE, KNN 13.55% improvement over the imbalanced and 7.01% over SMOTE. RF displays moderate performance, closely followed by LR and DT, while NB lags behind. The statistical significance of these findings is confirmed by t-Test, all below the 0.05 threshold. These findings underscore SDV's potential in enhancing CPDP outcomes and tackling CI challenges in SDV. With KNN as the best classification algorithm. Adoption of SDV could prove to be a promising tool for enhancing defect detection and CI mitigation

show abstract

Section: Methodsmentioning

confidence: 99%

“…Within the software defect dataset, most of the data exhibits a significantly larger proportion of non-defective samples compared to defective ones [31]. CI often results in bias within machine learning models towards the majority class [35].…”

Section: Oversampling With Synthetic Data Vaultmentioning

confidence: 99%

Impact of a Synthetic Data Vault for Imbalanced Class in Cross-Project Defect Prediction

Putri Nabella,

Rudy Herteno,

Setyo Wahyu Saputro

et al. 2024

j.electron.electromedical.eng.med.inform

View full text Add to dashboard Cite

show abstract

“…The proposed approach focused on redundancy and extracting the parameters by using principal component analysis (PCA). Yiwen Zhong et al, [17] CPDP-orientobject, semantic, and structural metrics (OSS)…”

Section: International Journal On Recent and Innovation Trends In Com...mentioning

confidence: 99%

Quality Analysis of Software Applications using Software Reliability Growth Models and Deep Learning Models

Sujitha,

Subrahmanyam

2023

IJRITCC

View full text Add to dashboard Cite

Finding the faults in the software is a very tedious task. Many software companies are trying to develop high-quality software which is having no faults. It is very important to analyze the errors, faults, and bugs in software development. Software reliability growth models (SRGM's) are used to help the software industries to create quality software products. Quality is the software metric that is used to analyze the performance of the software product. The software product which is having no errors or faults is considered the best software product. SRGM is also utilized to analyze the software quality based on the programming language. Deep Learning (DL) is a sub-domain in machine learning to solve several complex issues in software development. Finding accurate patterns from software faults is a very tedious task. DL algorithm performs better in integrating the SRGM with the DL approaches giving better results based on software fault detection. Many software faults real-time datasets are available to analyze the DL approaches. The performances of the various integrated models are analyzed by showing the quality metrics.

show abstract

Prediction of Defective Artifacts by Removing Redundant Metrics in Software Development Life Cycle (SDLC)

Gupta,

2023

2023 6th International Conference on Contemporary Computing and Informatics (IC3I)

View full text Add to dashboard Cite

An Empirical Study of Software Metrics Diversity for Cross-Project Defect Prediction

Cited by 6 publications

References 45 publications

Impact of a Synthetic Data Vault for Imbalanced Class in Cross-Project Defect Prediction

Impact of a Synthetic Data Vault for Imbalanced Class in Cross-Project Defect Prediction

Quality Analysis of Software Applications using Software Reliability Growth Models and Deep Learning Models

Prediction of Defective Artifacts by Removing Redundant Metrics in Software Development Life Cycle (SDLC)

Contact Info

Product

Resources

About