2010
DOI: 10.1007/s00726-010-0595-2
|View full text |Cite
|
Sign up to set email alerts
|

An approach for classification of highly imbalanced data using weighting and undersampling

Abstract: Real-world datasets commonly have issues with data imbalance. There are several approaches such as weighting, sub-sampling, and data modeling for handling these data. Learning in the presence of data imbalances presents a great challenge to machine learning. Techniques such as support-vector machines have excellent performance for balanced data, but may fail when applied to imbalanced datasets. In this paper, we propose a new undersampling technique for selecting instances from the majority class. The performa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
73
0
1

Year Published

2011
2011
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 144 publications
(74 citation statements)
references
References 32 publications
0
73
0
1
Order By: Relevance
“…Experiments done on different methods conclude with ambiguous results: while Anand et al [63], certified by Li et al [15] opt for sampling methods as optimal solution, we observe on the other front McCarthy et al [65]in agreement with Liu et al [88] on the superiority of the cost sensitive learning; while Quinlan [94] and Thomas [100] are approving ensemble learning methods; on the other hand Cieslak [38] and Marcellin [90] defend the algorithm modification approaches.…”
Section: Discussionmentioning
confidence: 99%
“…Experiments done on different methods conclude with ambiguous results: while Anand et al [63], certified by Li et al [15] opt for sampling methods as optimal solution, we observe on the other front McCarthy et al [65]in agreement with Liu et al [88] on the superiority of the cost sensitive learning; while Quinlan [94] and Thomas [100] are approving ensemble learning methods; on the other hand Cieslak [38] and Marcellin [90] defend the algorithm modification approaches.…”
Section: Discussionmentioning
confidence: 99%
“…As a result, accuracy is not used to evaluate the performance of classifier for imbalance datasets, and more reasonable evaluation metrics should be presented [33,34].…”
Section: Evaluation Measuresmentioning
confidence: 99%
“…In medical science, bioinformatics, and machine learning communities [23,24,33,34], the sensitivity (SE) and the specificity (SP) are two metrics used to evaluate the performance of classifiers. Sensitivity measures the proportion of actual positives which are correctly identified as such, while specificity can be defined as the proportion of negatives which are correctly identified.…”
Section: Evaluation Measuresmentioning
confidence: 99%
“…This measure tries to maximize the accuracy of both classes while keeping the two accuracies balanced. Several researchers have used this metric for evaluating classifiers on imbalanced datasets (Kubat and Matwin 1997;Robert et al 1997;Wu and Chang 2003;Anand et al 2010). We also utilize this metric to evaluate SVM classifier for the high imbalanced c-turn dataset and modify the evaluation criterion of LibSVM (Chang and Lin 2001) using the G-mean metric in this study.…”
Section: Imbalanced Problemmentioning
confidence: 99%