RUSBoost: A Hybrid Approach to Alleviating Class Imbalance

Seiffert, Chris; Khoshgoftaar, Taghi M.; Hulse, Jason Van; Napolitano, Amri

doi:10.1109/tsmca.2009.2029559

Cited by 1,511 publications

(779 citation statements)

References 20 publications

Supporting

Mentioning

721

Contrasting

Unclassified

Order By: Relevance

“…-SMOTEBoost (SBO): modified AdaBoost algorithm, in which base classifiers are constructed using SMOTE synthetic sampling (Chawla et al 2003). -RUSBoost (RUS): extension of SMOTEBoost approach, which uses additional undersampling in each boosting iteration (Seiffert et al 2010). -SMOTEBagging (SB): bagging method, which uses SMOTE to oversample dataset before constructing each of base classifiers (Wang and Yao 2009).…”

Section: Methodsmentioning

confidence: 99%

“…In this group, it is possible to distinguish ensemble classifiers such as SMOTEBoost (Chawla et al 2003), SMOTEBagging (Wang and Yao 2009), RAMOBoost (Chen et al 2010), which make use of oversampling to diversify the base learners, and models such as UnderBagging (Tao et al 2006), Roughly Balanced Bagging (Hido et al 2009), RUSBoost (Seiffert et al 2010) which apply undersampling before creating each of the component classifiers. In addition to the mentioned learning methods for imbalanced data, other internal techniques are successively applied to construct balanced classifiers, e.g., active learning strategies , granular computing (Tang et al 2007), or one-sided classification (Manevitz and Yousef 2002).…”

mentioning

confidence: 99%

See 1 more Smart Citation

Boosted SVM with active learning strategy for imbalanced data

Ziăba

Tomczak

2014

Soft Comput

View full text Add to dashboard Cite

In this work, we introduce a novel training method for constructing boosted Support Vector Machines (SVMs) directly from imbalanced data. The proposed solution incorporates the mechanisms of active learning strategy to eliminate redundant instances and more properly estimate misclassification costs for each of the base SVMs in the committee. To evaluate our approach, we make comprehensive experimental studies on the set of 44 benchmark datasets with various types of imbalance ratio. In addition, we present application of our method to the real-life decision problem related to the short-term loans repayment prediction.

show abstract

Section: Methodsmentioning

confidence: 99%

mentioning

confidence: 99%

Boosted SVM with active learning strategy for imbalanced data

Ziăba

Tomczak

2014

Soft Comput

View full text Add to dashboard Cite

show abstract

“…Typical data sampling approaches are to oversample the minority class (see Chawla et al 2002 for a description of SMOTEBoost) or undersample the majority class. The RUSBoost (Random Under Sampling) algorithm is designed to classify when one class has many more observations than another and good reference results have been obtained (Seiffert et al 2010). Blackard and Dean (1999) describe an ANN classification of an imbalanced dataset achieving 70.6% accuracy, whereas RUSBoost obtained over 76% classification accuracy.…”

Section: Ensemble Design and Algorithm Implementationmentioning

confidence: 99%

“…Also, rather than just a single classification, by aggregation across the learners the relative weight for a particular class label can be obtained. Due to the very limited number of iron failures (typically around 4-7% across DMAs in a particular year), results from many weak learners (1000 decision trees were utilised Seiffert et al 2010) in the final models, with deep trees for higher ensemble accuracy and setting minimal leaf size of 1 and learning rate of 0.1) were melded into one high-quality ensemble predictor using RUSBoost in this application. A protocol for equalising classes and randomly removing data points for particular model subsets is used to remove the imbalance.…”

Section: Ensemble Design and Algorithm Implementationmentioning

confidence: 99%

Ensemble Decision Tree Models Using RUSBoost for Estimating Risk of Iron Failure in Drinking Water Distribution Systems

Mounce

Ellis

Edwards

et al. 2017

Water Resour Manage

View full text Add to dashboard Cite

Safe, trusted drinking water is fundamental to society. Discolouration is a key aesthetic indicator visible to customers. Investigations to understand discolouration and iron failures in water supply systems require assessment of large quantities of disparate, inconsistent, multidimensional data from multiple corporate systems. A comprehensive data matrix was assembled for a seven year period across the whole of a UK water company (serving three million people). From this a novel data driven tool for assessment of iron risk was developed based on a yearly update and ranking procedure, for a subset of the best quality data. To avoid a 'black box' output, and provide an element of explanatory (human readable) interpretation, classification decision trees were utilised. Due to the very limited number of iron failures, results from many weak learners were melded into one high-quality ensemble predictor using the RUSBoost algorithm which is designed for class imbalance. Results, exploring simplicity vs predictive power, indicate enough discrimination between variable relationships in the matrix to produce ensemble decision tree classification models with good accuracy for iron failure estimation at District Management Area (DMA) scale. Two model variants were explored: 'Nowcast' (situation at end of calendar year) and 'Futurecast' (predict end of next year situation from this year's data). The Nowcast 2014 model achieved 100% True Positive Rate (TPR) and 95.3% True Negative Rate (TNR), with 3.3% of DMAs classified High Risk for un-sampled instances. The Futurecast 2014 achieved 60.5% TPR and 75.9% TNR, with 25.7% of DMAs classified High Risk for un-sampled instances. The output can be used to focus preventive measures to improve iron compliance.

show abstract

“…Each study was left out in turn from the full set of 13. The points on the remaining 12 paths were fed as a training set into an adapted version of a random subsampling boosting classifier (RUSBoost, presented in [9] and adapted as in Alg. 1).…”

Section: Tachycardia Termination Point Detectionmentioning

confidence: 99%

Traversed Graph Representation for Sparse Encoding of Macro-Reentrant Tachycardia

Constantinescu

Su-Lin

Ernst

et al. 2016

Statistical Atlases and Computational Models of the Heart. Imaging and Modelling Challenges

View full text Add to dashboard Cite

Abstract. Macro-reentrant atrial and ventricular tachycardias originate from additional circuits in which the activation of the cardiac chambers follows a high-frequency rotating pattern. The macro-reentrant circuit can be interrupted by targeted radiofrequency energy delivery with a linear lesion transecting the pathway. The choice of the optimal ablation site is determined by the operator's experience, thus limiting the procedure success, increasing its duration and also unnecessarily extending the ablated tissue area in the case of incorrect ablation target estimation. In this paper, an algorithm for automatic intraoperative detection of the tachycardia reentry path is proposed by modelling the propagation as a graph traverse problem. Moreover, the optimal ablation point where the path should be transected is computed. Finally, the proposed method is applied to sparse electroanatomical data to demonstrate its use when undersampled mapping occurs. Thirteen electroanatomical maps of right ventricle and right and left atrium tachycardias from patients treated for congenital heart disease were analysed retrospectively in this study, with prediction accuracy tested against the recorded ablation sites and arrhythmia termination points.

show abstract

RUSBoost: A Hybrid Approach to Alleviating Class Imbalance

Cited by 1,511 publications

References 20 publications

Boosted SVM with active learning strategy for imbalanced data

Boosted SVM with active learning strategy for imbalanced data

Ensemble Decision Tree Models Using RUSBoost for Estimating Risk of Iron Failure in Drinking Water Distribution Systems

Traversed Graph Representation for Sparse Encoding of Macro-Reentrant Tachycardia

Contact Info

Product

Resources

About