A Multiple Resampling Method for Learning from Imbalanced Data Sets

Estabrooks, Andrew; Jo, Taeho; Japkowicz, Nathalie

doi:10.1111/j.0824-7935.2004.t01-1-00228.x

Cited by 858 publications

(410 citation statements)

References 13 publications

Supporting

Mentioning

378

Contrasting

Unclassified

Order By: Relevance

“…No oversampling techniques were examimed in this study. While it is known that for a few imbalanced datasets oversampling has performed satisfactorily (Japkowicz and Stephen, 2002;Estabrooks and Japkowicz, 2004), in many other cases undersampling proves to be superior to oversampling (Domingos, 1999;Drummond and Holte, 2003).…”

Section: Class Distribution and Classification Performancementioning

confidence: 99%

Evaluation of Classifiers for an Uneven Class Distribution Problem

Daskalaki

Kopanas

Avouris

2006

Applied Artificial Intelligence

149

View full text Add to dashboard Cite

Classification problems with uneven class distributions present several difficulties during the training as well as during the evaluation process of classifiers. A classification problem with such characteristics has resulted from a data-mining project where the objective was to predict customer insolvency. Using the dataset from the customer insolvency problem we study several alternative methodologies which have been reported to better suit the specific characteristics of this type of problems. Three different but equally important directions are examined; (a) the performance measures that should be used for problems in this domain, (b) the class distributions that should be used for the training data sets, (c) the classification algorithms to be used. The final evaluation of the resulting classifiers is based on a study of the economic impact of classification results. This study concludes to a framework that provides the "best" classifiers, identifies the performance measures that should be used as the decision criterion and suggests the "best" class distribution based on the value of the relative gain from correct classification in the positive class.This framework has been applied in the customer insolvency problem, but it is claimed that it can be applied to many similar problems with uneven class distributions that almost always require a multi-objective evaluation proces.

show abstract

Section: Class Distribution and Classification Performancementioning

confidence: 99%

Evaluation of Classifiers for an Uneven Class Distribution Problem

Daskalaki

Kopanas

Avouris

2006

Applied Artificial Intelligence

149

View full text Add to dashboard Cite

show abstract

“…In general, it is unclear which approach is more effective and there have been attempts to combine them (Estabrooks et al, 2004). Another main approach is to attempt to modify the sensitivity of the classification algorithm so that errors on minority class to be costlier than errors on majority class (Veropoulos et al, 1999).…”

Section: Introductionmentioning

confidence: 99%

Author identification: Using text sampling to handle the class imbalance problem

Stamatatos

2008

Information Processing & Management

120

View full text Add to dashboard Cite

“…The problematic consequences thus are different. [23][24][25] Undersampling reduces the imbalanced ratio by randomly removing the majority examples and thus may lead to the loss of information about the majority class. Oversampling increases the size of the minority class by randomly duplicating the minority examples which may cause over fitting.…”

Section: Data Preprocessing Approachesmentioning

confidence: 99%

Augmenting cost-SVM with gaussian mixture models for imbalanced classification

He¹,

Silva³

et al. 2015

AIR

View full text Add to dashboard Cite

The Support Vector Machine (SVM), a known discriminative classifier is ineffective in dealing with imbalanced classification problems where the training examples of target class are outnumbered by non-target class examples. Though cost-SVM (cSVM) has been proposed to tackle the imbalanced datasets by assigning different cost functions to different classes, the performance is less than satisfactory due to its limited ability to enforce cost-sensitivity. In this research, a generative classifier, Gaussian Mixture Model (GMM) is studied which can learn the distribution of the imbalanced data to improve the discriminative power between imbalanced classes. By fusing this knowledge into cSVM, a model fusion approach, termed CSG (cSVM+GMM), is proposed to tackle the imbalanced classification problem. Experimental results on eleven benchmark datasets and one medical imaging dataset show the effectiveness of CSG in dealing with imbalanced classification problems.

show abstract

A Multiple Resampling Method for Learning from Imbalanced Data Sets

Cited by 858 publications

References 13 publications

Evaluation of Classifiers for an Uneven Class Distribution Problem

Evaluation of Classifiers for an Uneven Class Distribution Problem

Author identification: Using text sampling to handle the class imbalance problem

Augmenting cost-SVM with gaussian mixture models for imbalanced classification

Contact Info

Product

Resources

About