An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling

Gao, Xin; Ren, Bing; Zhang, Hao; Sun, Bohua; Li, Junliang; Xu, Jun; He, Yang; Li, Kangsheng

doi:10.1016/j.eswa.2020.113660

Cited by 50 publications

(23 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the extensive application of ensemble approaches, it has become an important issue for designing a more efficient ensemble classification algorithm. Compared with static ensemble algorithms, dynamic selection ensemble algorithms [ 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 ] have been shown to effectively improve the F-measure and G-mean values. A dynamic selection ensemble algorithm predicts the label of the test sample by evaluating the capability level of each classifier and selects the set of the most capable or competitive classifiers.…”

Section: Ensemble Approaches For Imbalanced Classificationmentioning

confidence: 99%

Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy

Zhao

Wang

et al. 2021

Entropy

View full text Add to dashboard Cite

Imbalance ensemble classification is one of the most essential and practical strategies for improving decision performance in data analysis. There is a growing body of literature about ensemble techniques for imbalance learning in recent years, the various extensions of imbalanced classification methods were established from different points of view. The present study is initiated in an attempt to review the state-of-the-art ensemble classification algorithms for dealing with imbalanced datasets, offering a comprehensive analysis for incorporating the dynamic selection of base classifiers in classification. By conducting 14 existing ensemble algorithms incorporating a dynamic selection on 56 datasets, the experimental results reveal that the classical algorithm with a dynamic selection strategy deliver a practical way to improve the classification performance for both a binary class and multi-class imbalanced datasets. In addition, by combining patch learning with a dynamic selection ensemble classification, a patch-ensemble classification method is designed, which utilizes the misclassified samples to train patch classifiers for increasing the diversity of base classifiers. The experiments’ results indicate that the designed method has a certain potential for the performance of multi-class imbalanced classification.

show abstract

Section: Ensemble Approaches For Imbalanced Classificationmentioning

confidence: 99%

Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy

Zhao

Wang

et al. 2021

Entropy

View full text Add to dashboard Cite

show abstract

“…According to the proportion of minority samples in neighborhoods of each minority sample, the data space is divided into five regions [12]: the boundary minority samples region, the noise minority samples region, the safe minority samples region, the boundary majority samples region, and the safe majority samples region, as shown in Figure 2.…”

Section: Data Partitionmentioning

confidence: 99%

HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition

2021

View full text Add to dashboard Cite

The classical classifiers are ineffective in dealing with the problem of imbalanced big dataset classification. Resampling the datasets and balancing samples distribution before training the classifier is one of the most popular approaches to resolve this problem. An effective and simple hybrid sampling method based on data partition (HSDP) is proposed in this paper. First, all the data samples are partitioned into different data regions. Then, the data samples in the noise minority samples region are removed and the samples in the boundary minority samples region are selected as oversampling seeds to generate the synthetic samples. Finally, a weighted oversampling process is conducted considering the generation of synthetic samples in the same cluster of the oversampling seed. The weight of each selected minority class sample is computed by the ratio between the proportion of majority class in the neighbors of this selected sample and the sum of all these proportions. Generation of synthetic samples in the same cluster of the oversampling seed guarantees new synthetic samples located inside the minority class area. Experiments conducted on eight datasets show that the proposed method, HSDP, is better than or comparable with the typical sampling methods for F-measure and G-mean.

show abstract

“…When tackling the above-mentioned datasets, traditional classification models usually have better performance on the majority class rather than the minority class, while the minority class is usually more important than the majority class. To improve classification model performance for the minority class of imbalanced dataset, many imbalance learning algorithms have been proposed [3][4][5][6].…”

Section: Ir = # # ⁄mentioning

confidence: 99%

“…In addition, imbalance learning has been listed as one of the top ten difficult problems in the field of data mining at ICDM'05 [7]. With the development of big data and deep learning techniques, we will face much more challenges in imbalance learning field [6,8], e.g., imbalance learning algorithm for big data processing platform, the new approach for synthetic minority sampling, and different tradeoff learning strategies for high IR datasets [8]. The existing imbalance learning algorithms can be divided into three groups [9,10].…”

Section: Ir = # # ⁄mentioning

confidence: 99%

Under-Sampling and Feature Selection Algorithms for S2SMLP

Liu

Zhang

2020

IEEE Access

View full text Add to dashboard Cite

Imbalance learning is a hot topic in the data mining and machine learning domains. Data-level, algorithm-level and ensemble solutions are the three main methods proposed thus far to address imbalance learning. To alleviate the issues of data explosion and feature selection in multilayer perceptron based on simultaneous two-sample representation(S2SMLP), in this paper, firstly, spectral clustering is exploited to select majority samples so as to construct a smaller training dataset for the classifier. We divide all majority samples into many clusters through spectral clustering, extract different numbers of representative samples from a cluster according to the size of each cluster, the average distance between the minority class and all samples of the cluster, then construct the training dataset of the classifier by combining these extracted samples from the majority class and all minority samples. Secondly, we propose a novel feature selection method based on the pairwise samples distance constraint, which considers the class labels of paired samples, select the features which push two similar samples closer together and pull two different samples farther apart. Finally, we conduct extensive experiments on 44 two-class imbalanced datasets and four highdimensional DNA microarray datasets. The experimental results demonstrate that our proposed algorithms outperform some state-of-the-art algorithms in terms of F-measure,-mean and AUC. INDEX TERMS Multilayer perceptron, Under-sampling, Spectral clustering, Imbalance learning, feature selection, information gain I. INTRODUCTION Classification is one of the hot topics of machine learning domains, its main task is to learn a classification model from training data and predict the labels of unknown samples. To date, many classification models have been proposed and are widely used in various real-world applications, e.g., naive Bayes (NB), logistic regression (LR), support vector machine (SVM) have been successfully employed in spam recognition, bank loan credit scoring and network rumor recognition, respectively. The success of traditional classification model is usually supported by the assumption that different classes in an original dataset are balanced [1], but the assumption is not always true. In other words, there is a class-imbalanced problem in a dataset [2], one class contains a large number of samples, which is called the majority class (or negative class), and the other class has a very small number of samples, which is called the minority class (or positive class), and IR is the imbalance ratio defined as follows.

show abstract

An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling

Cited by 50 publications

References 39 publications

Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy

Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy

HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition

Under-Sampling and Feature Selection Algorithms for S2SMLP

Contact Info

Product

Resources

About