Outliers are eccentric data points with anomalous nature. Clustering with outliers has received a lot of attention in the data processing community. But, they inordinately affect the quality of the results obtained in case of popular clustering algorithms during the process of finding an optimal solution. In this work, we propose a novel method to classify the data points with grouping characteristics as either an outlier or not. We use both distance and density of a particular data point with respect to the rest of the data points for this process. Distances are used to find the points at the extremities while the densities are used to identify the data points at the sparsest spaces. Further, every data model has to take into account the aspect of generalization in order to work robustly even in out of the box situations. Hence, our approach provides a generalization aspect to the model. The accuracy of the proposed work is measured using area under curve (AUC) was found the highest for cardioto data set -AUC value-0.90 and second highest AUC value was obtained for Spambase data set -0.52 and several other datasets are used to demonstrate the usage of the model proposed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.