2020
DOI: 10.3390/app10155164
|View full text |Cite
|
Sign up to set email alerts
|

A New Under-Sampling Method to Face Class Overlap and Imbalance

Abstract: Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum sp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(16 citation statements)
references
References 56 publications
0
9
0
Order By: Relevance
“…In general, most real-world data include various types of noises that might affect the performance in learning. Particularly, in imbalanced classification problems, it is known that the decision boundary can make the samples distinguishable more clearly after identifying and eliminating noise samples in overlapped regions (Fotouhi, Asadi, and Kattan 2019;Guzmán-Ponce et al 2020). Thus, several useful methods have been developed to identify and eliminate the noises in imbalanced classification problems, especially, which are close to the decision boundary.…”
Section: Anomaly Detection Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In general, most real-world data include various types of noises that might affect the performance in learning. Particularly, in imbalanced classification problems, it is known that the decision boundary can make the samples distinguishable more clearly after identifying and eliminating noise samples in overlapped regions (Fotouhi, Asadi, and Kattan 2019;Guzmán-Ponce et al 2020). Thus, several useful methods have been developed to identify and eliminate the noises in imbalanced classification problems, especially, which are close to the decision boundary.…”
Section: Anomaly Detection Methodsmentioning
confidence: 99%
“…Karami and Johansson (2014) provided an efficient hybrid clustering method called BDE-DBSCAN, which combines the binary differential evolution (BDE) method and DBSCAN algorithm to determine appropriate parameter values of ε and MinPts quickly and automatically (Karami and Johansson 2014). Guzmán-Ponce et al (2020) proposed an under-sampling method called DBMIST-US that combines DBSCAN and minimum spanning tree (MST) algorithm for identifying noisy samples and cleaning borderline samples (i.e., the samples close to the decision boundary) sequentially (Guzmán-Ponce et al 2020).…”
Section: Clustering Methodsmentioning
confidence: 99%
“…It randomly deletes some samples from the majority samples to achieve the same number as minority samples. Guzmán-Ponce et al [30] proposed a two-stage under-sampling method, which combined the DBSCAN [31] clustering algorithm with a minimum spanning tree algorithm to handle class overlap and imbalance simultaneously. Koziarski [32] proposed a method named CSMOUTE, which defined synthetic minority under-sampling by incorporating the two nearest majority instances.…”
Section: The Review Of Data Samplingmentioning
confidence: 99%
“…Roy et al [ 32 ], combine both SMOTE-Tomek to balance the Pima diabetes dataset using ANN and had achieved accuracy of 98%. Guzmán-Ponce et al [ 11 ] proposed two undersampling strategies that combine DBSCAN clustering to eliminate noisy samples and refine the decision boundary with a minimal spanning tree (MSA) algorithm to deal with the class imbalance.…”
Section: Related Workmentioning
confidence: 99%
“…Most resampling methods rely on the k nearest neighbor (KNN) rule [ 7 , 10 ], either by eliminating instances of two classes that are far from the decision boundary to reduce duplication as in condensing or by removing those that are close to the boundary for generalization as in filtering [ 11 ]. Similarly, Tomek-links are used to eliminate instances from the majority class since, if two examples form a Tomek link, then either one of them is noise or both are borderline.…”
Section: Introductionmentioning
confidence: 99%