Objective: Data imbalance exists in many real-life applications. In the imbalanced datasets, the minority class data creates a wrong inference during the classification that leads to more misclassification. More research has been done in the past to solve this issue, but as of now there is no global working solution found to do efficient classification. After analyzing various existing literatures, it is proposed to minimize the misclassification through genetic based oversampling and deep neural network (DNN) classifier. Method: In the proposed oversampling method synthetic samples are generated based on genetic algorithm. Initial populations for the genetic algorithm are generated using Gaussian weight initialization technique and the fittest individual from the population are selected by Euclidean distance for further processing to generate synthetic data in double the minority class size and the dataset is classified with the DNN. Findings: The performance of the oversampled training data with DNN Classifier is compared with C4.5 and Support Vector Machine (SVM) classifiers and found that the DNN classifier outperforms the other two classifiers. The data generated using SMOTE and ADASYN are considered for comparison. It is found that the proposed approach outperforms the other approaches. It is also proved from the experiment that misclassification is reduced and the proposed method is statistically significant and is comparatively better. Novelty: Initial population generation by Gaussian weight initialization, the fittest sample selection by Euclidean distance measure, synthetic samples with double the minority class size and DNN for classification to reduce the misclassification is novelty in this work.
Objective:The traditional classifiers are ineffective in classifying the imbalanced datasets. Most popular approach in resolving this problem is through data re-sampling. A hybrid resampling method is proposed in this paper that reduces the misclassification in all the classes. Method: The proposed method employs the Leader algorithm for under sampling and SMOTE algorithm for oversampling. It generates the desired number of samples in both the classes based on the problem that overcomes the over-fitting and under-fitting issues. Findings: To evaluate the performance of the proposed work, it is tested on 13 high imbalanced datasets obtained from the keel repository and the results are compared with the state-of-the-art hybrid data resampling methods such as SMOTE+Tomek Links, SMOTE+ENN, and SMOTE+RSB*. From the experiment it is observed that among the 13 high imbalanced datasets, the proposed method outperforms in 12 datasets and produces the same result in 1 dataset. The proposed method reduces the misclassification rates of minority and majority classes and is more suitable for the extreme imbalanced datasets. Novelty: This research work introduces a novel approach for classification by combining machine learning algorithms with domain-specific knowledge and resulting in significantly improved accuracy in classifying the extreme imbalanced datasets compared to the traditional methods. The uniqueness of the work is the utilization of the Leader algorithm and the SMOTE algorithm with a required resampling ratio instead of balancing and it improves the performance of the classification on the imbalanced data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.