Cross-projects software defect prediction improves the quality of new software projects or projects with a shortage of historical data. Therefore, various data mining techniques are recommended in this field. The classification accuracy issue is considered one of the most significant problems due to the shortage and heterogeneous in historical data. To address this challenge, this research utilizes a spotted hyena optimizer algorithm as a classifier to predict defects through cross-projects. Confidence and Support are utilized as a multi-objective fitness function to look for the best classification rules. These classification rules are used to predict defects for new projects or other projects with insufficient data. The datasets of NASA such as JM1, KC1, and KC2 are used. By applying spotted hyena optimizer algorithm as a classifier on one dataset and predicting defects in the other two datasets, accuracy is reported 84.6, 92.0, 82.4, 90.7, 86.6 and 81.8 for JM1, KC1, and KC2 respectively. These accuracy values are better than the most significant data mining techniques in the field such as Support Vector Machine, Naïve Bayes, Boosting, C4.5, and Bagging. Also, the proposed research discusses other performance measures such as precision, recall, and f-measure. The conclusion proves that there are many features of McCabe and Halstead that have a strong impact to generate highly accurate predictors for defects such as McCabe's line count of code, McCabe's cyclomatic complexity, McCabe's essential complexity, McCabe's design complexity iv, Halstead's effort, Halstead's time estimator, Halstead's line count, Halstead's count of line of comments and total operators.
Software engineering companies strive to improve software quality by predicting software defects-prone modules. Although various data mining methods have been developed, unstable accuracy rates are still critical issues owing to the imbalanced nature and high dimensionality of software defect datasets. To deal with this issue, we propose a spotted hyena, a novel meta-heuristic optimization algorithm for predicting software defects. Support and confidence in classification rules are the basis of a multi-objective fitness function that assists the spotted hyena algorithm in serving as a classifier by finding the fittest classification or standard rules among individuals. Experiments were conducted on four NASA software datasets, JM1, KC2, KC1, and PC3. The spotted hyena classifier provides an accuracy of 85.2, 84, 89.6, and 81.8%, respectively, for these datasets. These accuracy rates are better than those achieved using other popular data mining techniques. We also discuss other classification measures in connection with the experimental results, such as precision, recall, and confusion matrices, in connection with the experimental results. Moreover, the Gaussian mixture model is used to study the uncertainty quantification of the proposed classifier. The study proved the feasible performance of the spotted hyena classifier in four different case studies.
In this article, affiliation no. 1 incorrectly showed as the last author's first affiliation. The original article has been corrected.Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.