Improving Random Forest and Rotation Forest for highly imbalanced datasets

Su, Chong; Ju, Shenggen; Liu, Yiguang; Yu, Zhonghua

doi:10.3233/ida-150789

Cited by 35 publications

(20 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Before selecting subsamples, the sets of sample attributes should be segmented and combined randomly to obtain sequences of subsets of sample attributes, of which data can be preprocessed by feature transformation. Compared with Random Forest algorithm, which is the basis of RF, RF algorithm has a better performance on processing high dimensional and small-sample database [ 60 , 61 , 62 ]. The main procedure of building RF model includes: (1) dividing the attribute sets into several subsets; (2) obtaining sample subsets by resampling and making feature transformation on subsets of sample attributes; (3) realigning the rotation matrix according to sequence of original attribute sets; (4) training base classifiers based on the data which have been rotated; and (5) integrating results of various base classifiers and outputting the final forecast category.…”

Section: Methodsologymentioning

confidence: 99%

Novel GIS Based Machine Learning Algorithms for Shallow Landslide Susceptibility Mapping

Shirzadi

Soliamani

Habibnejhad

et al. 2018

Sensors

167

102

View full text Add to dashboard Cite

The main objective of this research was to introduce a novel machine learning algorithm of alternating decision tree (ADTree) based on the multiboost (MB), bagging (BA), rotation forest (RF) and random subspace (RS) ensemble algorithms under two scenarios of different sample sizes and raster resolutions for spatial prediction of shallow landslides around Bijar City, Kurdistan Province, Iran. The evaluation of modeling process was checked by some statistical measures and area under the receiver operating characteristic curve (AUROC). Results show that, for combination of sample sizes of 60%/40% and 70%/30% with a raster resolution of 10 m, the RS model, while, for 80%/20% and 90%/10% with a raster resolution of 20 m, the MB model obtained a high goodness-of-fit and prediction accuracy. The RS-ADTree and MB-ADTree ensemble models outperformed the ADTree model in two scenarios. Overall, MB-ADTree in sample size of 80%/20% with a resolution of 20 m (area under the curve (AUC) = 0.942) and sample size of 60%/40% with a resolution of 10 m (AUC = 0.845) had the highest and lowest prediction accuracy, respectively. The findings confirm that the newly proposed models are very promising alternative tools to assist planners and decision makers in the task of managing landslide prone areas.

show abstract

Section: Methodsologymentioning

confidence: 99%

Novel GIS Based Machine Learning Algorithms for Shallow Landslide Susceptibility Mapping

Shirzadi

Soliamani

Habibnejhad

et al. 2018

Sensors

167

102

View full text Add to dashboard Cite

show abstract

“…Rotation Forest has been also applied to the imbalanced problems, for example, Su et al [ 22 ] used Hellinger distance decision tree (HDDT) [ 23 , 24 ] instead of C4.5 to train individual classifiers on whole training set. Hosseinzadeh and Eftekharia [ 25 ] learned Rotation Forest on the data obtained by preprocessing training set using the synthetic oversampling technique (SMOTE) [ 8 ] and fuzzy cluster [ 40 ].…”

Section: Ensemble For Imbalanced Problemmentioning

confidence: 99%

“…The main heuristic is to apply feature extraction and subsequently reconstruct a full feature set for each classifier in the ensemble. This method is also applied to class-imbalance data; for example, Su et al [ 22 ] employed Hellinger distance decision tree (HDDT) [ 23 , 24 ] instead of C4.5 or CART as the base learner of Rotation Forest to deal with class-imbalance issues. Hosseinzadeh and Eftekharia [ 25 ] preprocessed the original data using the fuzzy cluster and synthetic oversampling technique (SMOTE) to obtain the training set on which Rotation Forest is learned.…”

Section: Introductionmentioning

confidence: 99%

Embedding Undersampling Rotation Forest for Imbalanced Problem

Guo

Diao

Liu

2018

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Rotation Forest is an ensemble learning approach achieving better performance comparing to Bagging and Boosting through building accurate and diverse classifiers using rotated feature space. However, like other conventional classifiers, Rotation Forest does not work well on the imbalanced data which are characterized as having much less examples of one class (minority class) than the other (majority class), and the cost of misclassifying minority class examples is often much more expensive than the contrary cases. This paper proposes a novel method called Embedding Undersampling Rotation Forest (EURF) to handle this problem (1) sampling subsets from the majority class and learning a projection matrix from each subset and (2) obtaining training sets by projecting re-undersampling subsets of the original data set to new spaces defined by the matrices and constructing an individual classifier from each training set. For the first method, undersampling is to force the rotation matrix to better capture the features of the minority class without harming the diversity between individual classifiers. With respect to the second method, the undersampling technique aims to improve the performance of individual classifiers on the minority class. The experimental results show that EURF achieves significantly better performance comparing to other state-of-the-art methods.

show abstract

“…This method is also applied to imbalanced problems, for example, Su et al [21] employed class imbalance-oriented learner, namely, Hellinger distance decision tree (HDDT), as the base classifier of rotation forest to handle class-imbalanced problem, and each base classifier is constructed on the whole training set. Hosseinzadeh and Eftekharia [22] learned rotation forest on the data obtained by preprocessing training set using synthetic oversampling technique (SMOTE) and fuzzy cluster.…”

Section: Strategies For Imbalanced Medical Datasetsmentioning

confidence: 99%

Ensemble of Rotation Trees for Imbalanced Medical Datasets

Guo

Liu

et al. 2018

Journal of Healthcare Engineering

View full text Add to dashboard Cite

Medical datasets are often predominately composed of “normal” examples with only a small percentage of “abnormal” ones and how to correctly recognize the abnormal examples is very meaningful. However, conventional classification learning methods try to pursue high accuracy by assuming that the number of any class examples is similar to each other, which lead to the fact that the abnormal class examples are usually ignored and misclassified to normal ones. In this paper, we propose a simple but effective ensemble method called ensemble of rotation trees (ERT) to handle this problem in imbalanced medical datasets. ERT learns an ensemble through the following four stages: (1) undersampling subsets from normal class, (2) obtaining new balanced training sets through combining each subset and abnormal class, (3) inducing a rotation matrix on randomly sampling subset of each new balanced set, and in each rotation matrix space, (4) learning a decision tree on each balanced training data. Here, the rotation matrix is mainly to improve the diversity between ensemble members, and undersampling technique aims to improve the performance of learned models on abnormal class. Experimental results show that, compared with other state-of-the-art methods, ERT shows significantly better performance for imbalanced medical datasets.

show abstract

Improving Random Forest and Rotation Forest for highly imbalanced datasets

Cited by 35 publications

References 24 publications

Novel GIS Based Machine Learning Algorithms for Shallow Landslide Susceptibility Mapping

Novel GIS Based Machine Learning Algorithms for Shallow Landslide Susceptibility Mapping

Embedding Undersampling Rotation Forest for Imbalanced Problem

Ensemble of Rotation Trees for Imbalanced Medical Datasets

Contact Info

Product

Resources

About