Davies Segera scite author profile

2021

Feature selection is the process of decreasing the number of features in a dataset by removing redundant, irrelevant, and randomly class-corrected data features. By applying feature selection on large and highly dimensional datasets, the redundant features are removed, reducing the complexity of the data and reducing training time. The objective of this paper was to design an optimizer that combines the well-known metaheuristic population-based optimizer, the grey wolf algorithm, and the gradient descent algorithm and test it for applications in feature selection problems. The proposed algorithm was first compared against the original grey wolf algorithm in 23 continuous test functions. The proposed optimizer was altered for feature selection, and 3 binary implementations were developed with final implementation compared against the two implementations of the binary grey wolf optimizer and binary grey wolf particle swarm optimizer on 6 medical datasets from the UCI machine learning repository, on metrics such as accuracy, size of feature subsets, F -measure, accuracy, precision, and sensitivity. The proposed optimizer outperformed the three other optimizers in 3 of the 6 datasets in average metrics. The proposed optimizer showed promise in its capability to balance the two objectives in feature selection and could be further enhanced.

Particle Swarm Optimized Hybrid Kernel-Based Multiclass Support Vector Machine for Microarray Cancer Data Analysis

Segera¹,

Mbuthia²,

Nyete³

2020

An Innovative Excited-ACS-IDGWO Algorithm for Optimal Biomedical Data Feature Selection

2020

Finding an optimal set of discriminative features is still a crucial but challenging task in biomedical science. The complexity of the task is intensified when any of the two scenarios arise: a highly dimensioned dataset and a small sample-sized dataset. The first scenario poses a big challenge to existing machine learning approaches since the search space for identifying the most relevant feature subset is so diverse to be explored quickly while utilizing minimal computational resources. On the other hand, the second aspect poses a challenge of too few samples to learn from. Though many hybrid metaheuristic approaches (i.e., combining multiple search algorithms) have been proposed in the literature to address these challenges with very attractive performance compared to their counterpart standard standalone metaheuristics, more superior hybrid approaches can be achieved if the individual metaheuristics within the proposed hybrid algorithms are improved prior to the hybridization. Motivated by this, we propose a new hybrid Excited- (E-) Adaptive Cuckoo Search- (ACS-) Intensification Dedicated Grey Wolf Optimization (IDGWO), i.e., EACSIDGWO. EACSIDGWO is an algorithm where the step size of ACS and the nonlinear control strategy of parameter a→ of the IDGWO are innovatively made adaptive via the concept of the complete voltage and current responses of a direct current (DC) excited resistor-capacitor (RC) circuit. Since the population has a higher diversity at early stages of the proposed EACSIDGWO algorithm, both the ACS and IDGWO are jointly involved in local exploitation. On the other hand, to enhance mature convergence at latter stages of the proposed algorithm, the role of ACS is switched to global exploration while the IDGWO is still left conducting the local exploitation. To prove that the proposed algorithm is superior in providing a good learning from fewer instances and an optimal feature selection from information-rich biomedical data, all these while maintaining a high classification accuracy of the data, the EACSIDGWO is employed to solve the feature selection problem. The EACSIDGWO as a feature selector is tested on six standard biomedical datasets from the University of California at Irvine (UCI) repository. The experimental results are compared with the state-of-the-art feature selection techniques, including binary ant-colony optimization (BACO), binary genetic algorithm (BGA), binary particle swarm optimization (BPSO), and extended binary cuckoo search algorithm (EBCSA). These results reveal that the EACSIDGWO has comprehensive superiority in tackling the feature selection problem, which proves the capability of the proposed algorithm in solving real-world complex problems. Furthermore, the superiority of the proposed algorithm is proved via various numerical techniques like ranking methods and statistical analysis.

Metaheuristics for optimal feature selection in high-dimensional datasets

2023

A Master-Slave Binary Grey Wolf Optimizer for Optimal Feature Selection in Biomedical Data Classification

Momanyi

2021

A new master-slave binary grey wolf optimizer (MSBGWO) is introduced. A master-slave learning scheme is introduced to the grey wolf optimizer (GWO) to improve its ability to explore and get better solutions in a search space. Five high-dimensional biomedical datasets are used to test the ability of MSBGWO in feature selection. The experimental results of MSBGWO are superior in terms of classification accuracy, precision, recall, F -measure, and number of features selected when compared to those of the binary grey wolf optimizer version 2 (BGWO2), binary genetic algorithm (BGA), binary particle swarm optimization (BPSO), differential evolution (DE) algorithm, and sine-cosine algorithm (SCA).

Particle Swarm Optimized Hybrid Kernel-Based Multiclass Support Vector Machine for Microarray Cancer Data Analysis

2019

Determining an optimal decision model is an important but difficult combinatorial task in imbalanced microarray-based cancer classification. Though the multiclass support vector machine (MCSVM) has already made an important contribution in this field, its performance solely depends on three aspects: the penalty factor C, the type of kernel, and its parameters. To improve the performance of this classifier in microarray-based cancer analysis, this paper proposes PSO-PCA-LGP-MCSVM model that is based on particle swarm optimization (PSO), principal component analysis (PCA), and multiclass support vector machine (MCSVM). The MCSVM is based on a hybrid kernel, i.e., linear-Gaussian-polynomial (LGP) that combines the advantages of three standard kernels (linear, Gaussian, and polynomial) in a novel manner, where the linear kernel is linearly combined with the Gaussian kernel embedding the polynomial kernel. Further, this paper proves and makes sure that the LGP kernel confirms the features of a valid kernel. In order to reveal the effectiveness of our model, several experiments were conducted and the obtained results compared between our model and other three single kernel-based models, namely, PSO-PCA-L-MCSVM (utilizing a linear kernel), PSO-PCA-G-MCSVM (utilizing a Gaussian kernel), and PSO-PCA-P-MCSVM (utilizing a polynomial kernel). In comparison, two dual and two multiclass imbalanced standard microarray datasets were used. Experimental results in terms of three extended assessment metrics (F-score, G-mean, and Accuracy) reveal the superior global feature extraction, prediction, and learning abilities of this model against three single kernel-based models.

An Efficient PCA-GA-HKSVM-Based Disease Diagnostic Assistant

Jerop

2021

Disease diagnosis faces challenges such as misdiagnosis, lack of diagnosis, and slow diagnosis. There are several machine learning techniques that have been applied to address these challenges, where a set of symptoms is applied to a classification model that predicts the presence or absence of a disease. To improve on the performance of these techniques, this paper presents a technique which involves feature selection using principal component analysis (PCA), a hybrid kernel-based support vector machine (HKSVM) classification model and hyperparameter optimization using genetic algorithm (GA). The HKSVM in this paper introduces a new way of combining three kernels: Radial basis function (RBF), linear, and polynomial. Combining local (RBF) and global (linear and polynomial) kernels has the effect of improved model performance. This is because the local kernels are better able to distinguish points closer to each other while the global kernels are more suited to distinguish points that are far away from each other. The PCA-GA-HKSVM is used on 7 different medical datasets, with two datasets being multiclass datasets and 5 datasets being binary. Performance evaluation metrics used were accuracy, precision, and recall. It was observed that the PCA-GA-HKSVM offered better performance than the single kernel support vector machines (SVMs).

An Excited Binary Grey Wolf Optimizer for Feature Selection in Highly Dimensional Datasets

2020