Classification over numerous real-world datasets has a peculiar drawback called unstructured class problem. A dataset is said to be unstructured when the majority of the class has more samples insignificantly than the minor class. Such drawbacks result in an ineffective performance of data classification techniques. Classification is a supervised learning method which acquires a training dataset to form its model for classifying unseen examples. However, in unstructured data classification, the class boundary learned by standard machine learning algorithms can be severely skewed toward the target class. As a result, the false-negative rate can be excessively high. The researches focus on the unstructured data classification using uncertain Nearest Neighbor (NN) decision rule and also found the major issues face by k-Nearest Neighbor (k-NN). In any case, given a dataset, prediction of accuracy is a monotonous task to improve the execution of kNN by tuning . Because of class imbalance, the performance of kNN decreases and this situation is represented by dissimilar characteristic from various classes. This paper addresses the issues faced by kNN by developing Adaptive -Condensed NN (Ada-CNN). The Ada-CNN classifier utilizes the distribution and density of test point's neighborhood and learn an appropriate point-explicit by using artificial neural systems. Ada-CNN performed well compared to kNN and other well-known classifiers. The experimental results showed that Ada-CNN achieved nearly 94% accuracy for Diabetes dataset and 100% accuracy in pop-failure compared to kNN for imbalanced classification.
In the dataset, any one of its classes is normally outnumbered by other classes and is known as class imbalance data. Many standard learning algorithms face the classification problem in performance due to imbalance data. The issues can be solved by many existing conventional methods such as cost-sensitive, sampling or ensemble methods. But these methods alter the original data distribution, which leads to loss of useful information of the users and it may cause unexpected errors or increase the problem of overfitting. In this research, local Mahalanobis distance learning (LMDL) method is applied in the nearest neighbor (NN) for improving the performance of the classification in the imbalance dataset. The multiple distance metrics are used in the LMDL to investigate the data effectively and obtain the relevant features based on the analysis. The distance metric uses the original data for learning the prototype and support the NN. A number of experiments on various datasets are conducted for validating the quality as well as the efficiency of the proposed LMDL method. The experimental results stated that the proposed LMDL achieved nearly 82% in E-coli dataset, 94% in breast cancer dataset and 98% in Iris dataset for all metrics such as accuracy, precision, recall and F-measure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.