2016
DOI: 10.1371/journal.pone.0146116
|View full text |Cite
|
Sign up to set email alerts
|

Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification

Abstract: Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utili… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
47
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 67 publications
(48 citation statements)
references
References 41 publications
1
47
0
Order By: Relevance
“…These data are also supported by another study [14], where the repeated tuning of an MF and RS was carried out by the optimization method could achieve the dimensionality challenges and multiple-class imbalanced data for optimal classifications. The lack of previous studies with the application of WSA for gene selection and RS based on multiclass gene expression datasets, making it difficult to compare our results directly is also one of the limitations in this study.…”
Section: Discussionsupporting
confidence: 56%
See 2 more Smart Citations
“…These data are also supported by another study [14], where the repeated tuning of an MF and RS was carried out by the optimization method could achieve the dimensionality challenges and multiple-class imbalanced data for optimal classifications. The lack of previous studies with the application of WSA for gene selection and RS based on multiclass gene expression datasets, making it difficult to compare our results directly is also one of the limitations in this study.…”
Section: Discussionsupporting
confidence: 56%
“…Moreover, it was also demonstrated that finding an optimal number of genes for multiclass problems is more beneficial for diagnosis of cancer. The ensemble combinatorial search is integrated into GA [14] as a single objective GA for optimization of the ensemble technique to classify class-imbalanced datasets. Nonetheless, a single objective GA attempts to locate solutions closer to the local optimum and hence the average error is much greater than in the proposed approach, which finds global optimal solutions for the classification.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Improvement strategy: (1) In order to ensure that the distribution of the data samples in the training data set is consistent with the overall distribution, the classification accuracy of the base classifier for a few classes of data on an unbalanced data set is improved, According to the literature [1], Firstly, the data set is changed based on the SMOTE algorithm idea and the undersampling method, and the unbalanced data is changed into relatively equalized data. Then the traditional Bagging algorithm is substituted into the base classifier to weaken the minority and majority classes.…”
Section: Algorithm Improvement Background and Strategymentioning
confidence: 99%
“…The so-called non-equilibrium data set refers to the fact that there are far more samples of certain classes in the data set than other classes [1], in which a class with many samples is called a majority class, and a class with a few samples is called a minority class. The classification of unbalanced data sets is very common in practical applications, such as fraudulent identification of fraudulent card transactions, oil well eruption of satellite image detection, failure prediction of telecommunication equipment, medical diagnosis [2], bankruptcy prediction, radar image surveillance, financial loan management, and fraud.…”
Section: Introductionmentioning
confidence: 99%