Abstract. The class imbalanced problem occurs in various disciplines when one of target classes has a tiny number of instances comparing to other classes. A typical classifier normally ignores or neglects to detect a minority class due to the small number of class instances. SMOTE is one of over-sampling techniques that remedies this situation. It generates minority instances within the overlapping regions. However, SMOTE randomly synthesizes the minority instances along a line joining a minority instance and its selected nearest neighbours, ignoring nearby majority instances. Our technique called SafeLevel-SMOTE carefully samples minority instances along the same line with different weight degree, called safe level. The safe level computes by using nearest neighbour minority instances. By synthesizing the minority instances more around larger safe level, we achieve a better accuracy performance than SMOTE and Borderline-SMOTE.
A dataset exhibits the class imbalance problem when a target class has a very small number of instances relative to other classes. A trivial classifier typically fails to detect a minority class due to its extremely low incidence rate. In this paper, a new over-sampling technique called DBSMOTE is proposed. Our technique relies on a density-based notion of clusters and is designed to oversample an arbitrarily shaped cluster discovered by DB-SCAN. DBSMOTE generates synthetic instances along a shortest path from each positive instance to a pseudocentroid of a minority-class cluster. Consequently, these synthetic instances are dense near this centroid and are sparse far from this centroid. Our experimental results show that DBSMOTE improves precision, F-value, and AUC more effectively than SMOTE, Borderline-SMOTE, and Safe-Level-SMOTE for imbalanced datasets.
This paper proposes a very fast 1-pass-throw-away learning algorithm based on a hyperellipsoidal function that can be translated and rotated to cover the data set during learning process. The translation and rotation of hyperellipsoidal function depends upon the distribution of the data set. In addition, we present versatile elliptic basis function (VEBF) neural network with one hidden layer. The hidden layer is adaptively divided into subhidden layers according to the number of classes of the training data set. Each subhidden layer can be scaled by incrementing a new node to learn new samples during training process. The learning time is O(n), where n is the number of data. The network can independently learn any new incoming datum without involving the previously learned data. There is no need to store all the data in order to mix with the new incoming data during the learning process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.