Class Imbalance is the potential problem that has been existent in machine learning, which hinders the performance of the classification algorithm when applied in real world applications such as electricity pilferage, fraudulent transactions, anomaly detection, prediction of rare diseases, etc.Class Imbalance refers to the problem where the distribution of the sample is skewed or biased towards one particular class. Due to its intrinsic nature the software fault prediction dataset falls into the same category where the software modules contain fewer defective modules compared to the nondefective modules. Majority of the over sampling techniques that has been proposed is to address the issue by generating synthetic samples of minority class to balance the dataset. But the synthetic samples generated are near duplicates that also results in over-generalization issue. We thus propose a novel oversampling approach to introduce synthetic samples using Genetic algorithm (GA). GA is a form of evolutionary algorithm that employs biologically inspired techniques such as inheritance, mutation, selection, and crossover. The proposed algorithm generates synthetic sample of minority class based on the distribution measure and ensures that the samples are diverse within the class and are efficient. The proposed over sampling algorithm has been compared with SMOTE,B-SMOTE, ADASYN, Random Oversampling, MAHAKIL and no sampling approach with a 20 defect prediction dataset from the Promise repository and five prediction models. The results indicate that the Genetic algorithm over sampling approach improves the fault prediction performance and reduced false alarm rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.