Amri Napolitano scite author profile

Abstract-Class imbalance is a problem that is common to many application domains. When examples of one class in a training data set vastly outnumber examples of the other class(es), traditional data mining algorithms tend to create suboptimal classification models. Several techniques have been used to alleviate the problem of class imbalance, including data sampling and boosting. In this paper, we present a new hybrid sampling/boosting algorithm, called RUSBoost, for learning from skewed training data. This algorithm provides a simpler and faster alternative to SMOTEBoost, which is another algorithm that combines boosting and data sampling. This paper evaluates the performances of RUSBoost and SMOTEBoost, as well as their individual components (random undersampling, synthetic minority oversampling technique, and AdaBoost). We conduct experiments using 15 data sets from various application domains, four base learners, and four evaluation metrics. RUSBoost and SMOTEBoost both outperform the other procedures, and RUSBoost performs comparably to (and often better than) SMOTEBoost while being a simpler and faster technique. Given these experimental results, we highly recommend RUSBoost as an attractive alternative for improving the classification performance of learners built using imbalanced data.

show abstract

Experimental perspectives on learning from imbalanced data

Hulse

2007

View full text Add to dashboard Cite

RUSBoost: Improving classification performance when training data is skewed

et al. 2008

View full text Add to dashboard Cite

Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

Khoshgoftaar

Hulse

Napolitano

2011

IEEE Trans. Syst., Man, Cybern. A

254

114

View full text Add to dashboard Cite

A review of the stability of feature selection techniques for bioinformatics data

Awada

Khoshgoftaar

Dittman

et al. 2012

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Amri Napolitano

RUSBoost: A Hybrid Approach to Alleviating Class Imbalance

Experimental perspectives on learning from imbalanced data

RUSBoost: Improving classification performance when training data is skewed

Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

A review of the stability of feature selection techniques for bioinformatics data

Contact Info

Product

Resources

About