Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm

Sharifai, Garba Abdulrauf; Zainol, Zurinahni

doi:10.3390/genes11070717

Cited by 25 publications

(6 citation statements)

References 76 publications

(91 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Implementation was performed in MATLAB R 2015. The datasetsused are Lung Cancer [14], Prostrate Tumor from repository and SRBCT [14]. The details of the dataset are listed in table 5.1.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Pearson’s Redundancy Multi-Filtering with BAT Algorithm for Selecting High Dimensional Imbalanced Features

Alluhaidan¹,

Prabu²,

Sivakumar³

2021

Preprint

View full text Add to dashboard Cite

Feature selection plays a vital role for every data analysis application. Feature selection aims to choose prominent set of features after removing redundant and irrelevant features from original set of features. High Dimensional dataset poses a challenging task for Machine Learning algorithms. Many state-of-art solutions were developed to handle this issue. High dimensionality in addition to imbalance ratio in the dataset becomes a tedious task. To overcome the issue, this paper introduces a novel method namely Pearson’s Redundancy Based Multi Filter algorithm with improved BAT algorithm (PRBMF-iBAT) to obtain multiple feature subsets. PRBMF is implemented using multiple filters to obtain highly relevant features. iBAT algorithm uses these features to find best subset of features for classification. The results prove that PRBMF-iBAT perform better for the classifier in terms of Accuracy, Precision, Recall and F- Measure for three micro array datasets with SVM classifier. The proposed system achieves 97.99% of accuracy as highest compared to the existing rCBR-BGOA algorithm.

show abstract

“…Implementation was performed in MATLAB R 2015. The datasetsused are Lung Cancer [14], Prostrate Tumor from repository and SRBCT [14]. The details of the dataset are listed in table 5.1.…”

Section: Resultsmentioning

confidence: 99%

“…F-measure could be an appropriate measure to evaluate the efficiency of proposed method. Proposed PRBMF-iBAT is compared with Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA) [14 ]for Support Vector Machine. Fivefold cross validation is performed to compare the results.…”

Section: Resultsmentioning

confidence: 99%

Pearson’s Redundancy Multi-Filtering with BAT Algorithm for Selecting High Dimensional Imbalanced Features

Alluhaidan¹,

Prabu²,

Sivakumar³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Identifying the most important features was based on the two most used feature selection filter methods in ML: (1) feature importance and (2) correlation-based feature selection. We used filter methods of feature selection because it is independent of the potential models [ 11 ]. Feature importance is a univariate filter that compares each feature’s correlation with the outcome separately and removes features with zero importance according to a gradient boosting machine (GBM) learning model.…”

Section: Methodsmentioning

confidence: 99%

Prediction Model of Anastomotic Leakage Among Esophageal Cancer Patients After Receiving an Esophagectomy: Machine Learning Approach

Zhao¹,

Cheng²,

Sun³

et al. 2021

JMIR Med Inform

View full text Add to dashboard Cite

Background Anastomotic leakage (AL) is one of the severe postoperative adverse events (5%-30%), and it is related to increased medical costs in cancer patients who undergo esophagectomies. Machine learning (ML) methods show good performance at predicting risk for AL. However, AL risk prediction based on ML models among the Chinese population is unavailable. Objective This study uses ML techniques to develop and validate a risk prediction model to screen patients with emerging AL risk factors. Methods Analyses were performed using medical records from 710 patients who underwent esophagectomies at the National Clinical Research Center for Cancer between January 2010 and May 2015. We randomly split (9:1) the data set into a training data set of 639 patients and a testing data set of 71 patients using a computer algorithm. We assessed multiple classification tools to create a multivariate risk prediction model. Our ML algorithms contained decision tree, random forest, naive Bayes, and logistic regression with least absolute shrinkage and selection operator. The optimal AL prediction model was selected based on model evaluation metrics. Results The final risk panel included 36 independent risk features. Of those, 10 features were significantly identified by the logistic model, including aortic calcification (OR 2.77, 95% CI 1.32-5.81), celiac trunk calcification (OR 2.79, 95% CI 1.20-6.48), forced expiratory volume 1% (OR 0.51, 95% CI 0.30-0.89); TLco (OR 0.56, 95% CI 0.27-1.18), peripheral vascular disease (OR 4.97, 95% CI 1.44-17.07), laparoscope (OR 3.92, 95% CI 1.23-12.51), postoperative length of hospital stay (OR 1.17, 95% CI 1.13-1.21), vascular permeability activity (OR 0.46, 95% CI 0.14-1.48), and fat liquefaction of incisions (OR 4.36, 95% CI 1.86-10.21). Logistic regression with least absolute shrinkage and selection operator offered the highest prediction quality with an area under the receiver operator characteristic of 72% in the training data set. The testing model also achieved similar high performance. Conclusions Our model offered a prediction of AL with high accuracy, assisting in AL prevention and treatment. A personalized ML prediction model with a purely data-driven selection of features is feasible and effective in predicting AL in patients who underwent esophagectomy.

show abstract

“…Due to inconsistent presentations of training intensities from various training methods such as "6 repetition maximum (RM)" in strength work, "85% of 1 RM" in weightlifting, or "bodyweight" in plyometrics, the "intensity" was discarded, but instead, the input of multiple training methods was allowed such that lower limb strength training represents training intensity with the use of at least 80% 1 RM in no less than two weeks of training. Since the type of sports background of subjects were diverse in the selected studies, they were summarized as "vertical based sports", "horizontal based sports", and "other sports" based on the characterized nature of sports movements to avoid an imbalanced dataset or cardinality issues [44]. Furthermore, training programs of intervention studies varied training volumes in different phases or periods.…”

Section: Identification Of Predictorsmentioning

confidence: 99%

Using Machine Learning Algorithms to Pool Data from Meta-Analysis for the Prediction of Countermovement Jump Improvement

Ho¹,

Weldon

Yong³

et al. 2023

IJERPH

View full text Add to dashboard Cite

To solve the research–practice gap and take one step forward toward using big data with real-world evidence, the present study aims to adopt a novel method using machine learning to pool findings from meta-analyses and predict the change of countermovement jump. The data were collected through a total of 124 individual studies included in 16 recent meta-analyses. The performance of four selected machine learning algorithms including support vector machine, random forest (RF) ensemble, light gradient boosted machine, and the neural network using multi-layer perceptron was compared. The RF yielded the highest accuracy (mean absolute error: 0.071 cm; R2: 0.985). Based on the feature importance calculated by the RF regressor, the baseline CMJ (“Pre-CMJ”) was the most impactful predictor, followed by age (“Age”), the total number of training sessions received (“Total number of training_session”), controlled or non-controlled conditions (“Control (no training)”), whether the training program included squat, lunge, deadlift, or hip thrust exercises (“Squat_Lunge_Deadlift_Hipthrust_True”, “Squat_Lunge_Deadlift_Hipthrust_False”), or “Plyometric (mixed fast/slow SSC)”, and whether the athlete was from an Asian pacific region including Australia (“Race_Asian or Australian”). By using multiple simulated virtual cases, the successful predictions of the CMJ improvement are shown, whereas the perceived benefits and limitations of using machine learning in a meta-analysis are discussed.

show abstract

Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm

Cited by 25 publications

References 76 publications

Pearson’s Redundancy Multi-Filtering with BAT Algorithm for Selecting High Dimensional Imbalanced Features

Pearson’s Redundancy Multi-Filtering with BAT Algorithm for Selecting High Dimensional Imbalanced Features

Prediction Model of Anastomotic Leakage Among Esophageal Cancer Patients After Receiving an Esophagectomy: Machine Learning Approach

Using Machine Learning Algorithms to Pool Data from Meta-Analysis for the Prediction of Countermovement Jump Improvement

Contact Info

Product

Resources

About