2010
DOI: 10.1109/tsmca.2009.2029559
|View full text |Cite
|
Sign up to set email alerts
|

RUSBoost: A Hybrid Approach to Alleviating Class Imbalance

Abstract: Abstract-Class imbalance is a problem that is common to many application domains. When examples of one class in a training data set vastly outnumber examples of the other class(es), traditional data mining algorithms tend to create suboptimal classification models. Several techniques have been used to alleviate the problem of class imbalance, including data sampling and boosting. In this paper, we present a new hybrid sampling/boosting algorithm, called RUSBoost, for learning from skewed training data. This al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

6
721
0
4

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 1,511 publications
(779 citation statements)
references
References 20 publications
6
721
0
4
Order By: Relevance
“…-SMOTEBoost (SBO): modified AdaBoost algorithm, in which base classifiers are constructed using SMOTE synthetic sampling (Chawla et al 2003). -RUSBoost (RUS): extension of SMOTEBoost approach, which uses additional undersampling in each boosting iteration (Seiffert et al 2010). -SMOTEBagging (SB): bagging method, which uses SMOTE to oversample dataset before constructing each of base classifiers (Wang and Yao 2009).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…-SMOTEBoost (SBO): modified AdaBoost algorithm, in which base classifiers are constructed using SMOTE synthetic sampling (Chawla et al 2003). -RUSBoost (RUS): extension of SMOTEBoost approach, which uses additional undersampling in each boosting iteration (Seiffert et al 2010). -SMOTEBagging (SB): bagging method, which uses SMOTE to oversample dataset before constructing each of base classifiers (Wang and Yao 2009).…”
Section: Methodsmentioning
confidence: 99%
“…In this group, it is possible to distinguish ensemble classifiers such as SMOTEBoost (Chawla et al 2003), SMOTEBagging (Wang and Yao 2009), RAMOBoost (Chen et al 2010), which make use of oversampling to diversify the base learners, and models such as UnderBagging (Tao et al 2006), Roughly Balanced Bagging (Hido et al 2009), RUSBoost (Seiffert et al 2010) which apply undersampling before creating each of the component classifiers. In addition to the mentioned learning methods for imbalanced data, other internal techniques are successively applied to construct balanced classifiers, e.g., active learning strategies , granular computing (Tang et al 2007), or one-sided classification (Manevitz and Yousef 2002).…”
mentioning
confidence: 99%
“…Typical data sampling approaches are to oversample the minority class (see Chawla et al 2002 for a description of SMOTEBoost) or undersample the majority class. The RUSBoost (Random Under Sampling) algorithm is designed to classify when one class has many more observations than another and good reference results have been obtained (Seiffert et al 2010). Blackard and Dean (1999) describe an ANN classification of an imbalanced dataset achieving 70.6% accuracy, whereas RUSBoost obtained over 76% classification accuracy.…”
Section: Ensemble Design and Algorithm Implementationmentioning
confidence: 99%
“…Also, rather than just a single classification, by aggregation across the learners the relative weight for a particular class label can be obtained. Due to the very limited number of iron failures (typically around 4-7% across DMAs in a particular year), results from many weak learners (1000 decision trees were utilised Seiffert et al 2010) in the final models, with deep trees for higher ensemble accuracy and setting minimal leaf size of 1 and learning rate of 0.1) were melded into one high-quality ensemble predictor using RUSBoost in this application. A protocol for equalising classes and randomly removing data points for particular model subsets is used to remove the imbalance.…”
Section: Ensemble Design and Algorithm Implementationmentioning
confidence: 99%
“…Each study was left out in turn from the full set of 13. The points on the remaining 12 paths were fed as a training set into an adapted version of a random subsampling boosting classifier (RUSBoost, presented in [9] and adapted as in Alg. 1).…”
Section: Tachycardia Termination Point Detectionmentioning
confidence: 99%