Intelligent Control and Automation
DOI: 10.1007/11816492_89
|View full text |Cite
|
Sign up to set email alerts
|

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Abstract: The most important factor of classification for improving classification accuracy is the training data. However, the data in real-world applications often are imbalanced class distribution, that is, most of the data are in majority class and little data are in minority class. In this case, if all the data are used to be the training data, the classifier tends to predict that most of the incoming data belong to the majority class. Hence, it is important to select the suitable training data for classification in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
36
0

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 60 publications
(36 citation statements)
references
References 3 publications
0
36
0
Order By: Relevance
“…SMOTE algorithm has been applied with several different classifiers and was also integrated with boosting and bagging . SMOTE generates synthetic examples with the positive class label disregarding the negative class examples which may lead to overgeneraliza-tion (Yen and Lee, 2006;Maciejewski and Stefanowski, 2011;Yen and Lee, 2009). This strategy may be specially problematic in the case of highly skewed class distributions where the minority class examples are very sparse, thus resulting in a greater chance of class mixture.…”
Section: Re-samplingmentioning
confidence: 99%
“…SMOTE algorithm has been applied with several different classifiers and was also integrated with boosting and bagging . SMOTE generates synthetic examples with the positive class label disregarding the negative class examples which may lead to overgeneraliza-tion (Yen and Lee, 2006;Maciejewski and Stefanowski, 2011;Yen and Lee, 2009). This strategy may be specially problematic in the case of highly skewed class distributions where the minority class examples are very sparse, thus resulting in a greater chance of class mixture.…”
Section: Re-samplingmentioning
confidence: 99%
“…Resolving the imbalanced data problem. To resolve the imbalanced data problem, we used the SMOTE method (see Methods for details) and the under-sampling method 29 on our genome data. The work flow of the mirexplorer classifier.…”
Section: Resultsmentioning
confidence: 99%
“…The number of non-binding segments was much larger than that of the binding segments, which led to a heavy imbalance in the datasets ( Table 1). According to the methods of previous works (Yen and Lee, 2006;Roy et al, 2015), we took the number of positive samples as the standard and randomly extracted the equal number of negative samples. In this way, the negative samples were randomly selected 10 times to ensure the credibility of the results.…”
Section: Benchmark Datasetmentioning
confidence: 99%