Cluster-based under-sampling approaches for imbalanced data distributions

Yen, Show‐Jane; Lee, Yue‐Shi

doi:10.1016/j.eswa.2008.06.108

Cited by 587 publications

(262 citation statements)

References 10 publications

Supporting

Mentioning

232

Contrasting

Unclassified

Order By: Relevance

“…The last method ''average farthest" is similar to the ''average nearest" method; it selects the majority class samples which have the farthest average distances from all the minority class samples. These under-sampling approaches based on distance, expend a lot of time selecting the majority class samples in the large dataset, and they are not efficient in real applications [7].…”

Section: Introductionmentioning

confidence: 99%

“…Under-sampling is a technique to reduce the number of samples in the majority class, where the size of the majority class sample is reduced from the original datasets to balance the class distribution. One simple method of under-sampling (random under-sampling) is to select a subset of majority class samples randomly and then combine them with minority class sample as a training set [7]. Many researchers have proposed some advanced way of under-sampling the majority class data.…”

Section: Introductionmentioning

confidence: 99%

“…Chawla [5] proposed an over-sampling approach called SMOTE in which the minority class is over-sampled by creating "synthetic" examples rather than by over-sampling with duplicated real data entries. SMOTE blindly generates synthetic minority class samples without considering majority class samples and may thus cause overgeneralization [7]. Over-sampling may cause longer training time and over-fitting.…”

Section: Introductionmentioning

confidence: 99%

“…On the other hand, if a cluster has more minority class samples and less majority class samples, it does not hold the characteristics of the majority class samples and behaves more like the minority class samples. Therefore, their approach selects a suitable number of majority class samples from each cluster by considering the ratio of the number of majority class samples to the number of minority class samples in the derived cluster [7]. They first cluster the full data to K clusters.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Addressing the Class Imbalance Problem in Medical Datasets

Rahman¹,

Davis²

2013

IJMLC

298

147

View full text Add to dashboard Cite

Abstract-A well balanced dataset is very important for creating a good prediction model. Medical datasets are often not balanced in their class labels. Most existing classification methods tend to perform poorly on minority class examples when the dataset is extremely imbalanced. This is because they aim to optimize the overall accuracy without considering the relative distribution of each class. In this paper we examine the performance of over-sampling and under-sampling techniques to balance cardiovascular data. Well known over-sampling technique SMOTE is used and some under-sampling techniques are also explored. An improved under sampling technique is proposed. Experimental results show that the proposed method displays significant better performance than the existing methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Addressing the Class Imbalance Problem in Medical Datasets

Rahman¹,

Davis²

2013

IJMLC

298

147

View full text Add to dashboard Cite

show abstract

“…Here we give a brief review of cluster-based under-sampling methods for imbalanced data, because it shows more related to our work [20][21][22][23]. These algorithms differ on whether the clustering is done on the whole training data or inside each category.…”

Section: Classification Of Imbalanced Datamentioning

confidence: 99%

Local Clustering Conformal Predictor for Imbalanced Data Classification

Wang

Chen

et al. 2013

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Abstract. The recently developed Conformal Predictor (CP) can provide calibrated confidence for prediction which is out of the traditional predictors' capacity. However, CP works for balanced data and fails in the case of imbalanced data. To handle this problem, Local Clustering Conformal Predictor (LCCP) which plugs a two-level partition into the framework of CP is proposed. In the first-level partition, the whole imbalanced training dataset is partitioned into some class-taxonomy data subsets. Secondly, the majority class examples proceed to be partitioned into some cluster-taxonomy data subsets by clustering method. To predict a new instance, LCCP selects the nearest cluster, incorporated with the minority class examples, to build a re-balanced training data. The designed LCCP model aims to not only provide valid confidence for prediction, but significantly improve the prediction efficiency as well. The experimental results show that LCCP model presents superiority than CP model for imbalanced data classification.

show abstract

Handling Unbalanced Data in Clinical Images

Verma

2022

Advanced Healthcare Systems

View full text Add to dashboard Cite

Cluster-based under-sampling approaches for imbalanced data distributions

Cited by 587 publications

References 10 publications

Addressing the Class Imbalance Problem in Medical Datasets

Addressing the Class Imbalance Problem in Medical Datasets

Local Clustering Conformal Predictor for Imbalanced Data Classification

Handling Unbalanced Data in Clinical Images

Contact Info

Product

Resources

About