Ensemble Feature Selection Compares to Meta-analysis for Breast Cancer Biomarker Identification from Microarray Data

Trevizan, Bernardo; Recamonde‐Mendoza, Mariana

doi:10.1007/978-3-030-86653-2_12

Cited by 3 publications

(1 citation statement)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Traditionally, ML methods employ a conventional, random-based sampling strategy when performing the training step ( 15 , 17 ). However, in presence of imbalanced data there is a potential benefit in opting for stratified sampling.…”

Section: Introductionmentioning

confidence: 99%

Optimizing hybrid ensemble feature selection strategies for transcriptomic biomarker discovery in complex diseases

Claude,

Leclercq,

Thébault

et al. 2024

NAR Genomics and Bioinformatics

View full text Add to dashboard Cite

Biomedical research takes advantage of omic data, such as transcriptomics, to unravel the complexity of diseases. A conventional strategy identifies transcriptomic biomarkers characterized by expression patterns associated with a phenotype by relying on feature selection approaches. Hybrid ensemble feature selection (HEFS) has become increasingly popular as it ensures robustness of the selected features by performing data and functional perturbations. However, it remains difficult to make the best suited choices at each step when designing such approaches. We conducted an extensive analysis of four possible HEFS scenarios for the identification of Stage IV colorectal, Stage I kidney and lung and Stage III endometrial cancer biomarkers from transcriptomic data. These scenarios investigate the use of two types of feature reduction by filters (differentially expressed genes and variance) conjointly with two types of resampling strategies (repeated holdout by distribution-balanced stratified and random stratified) for downstream feature selection through an aggregation of thousands of wrapped machine learning models. Based on our results, we emphasize the advantages of using HEFS approaches to identify complex disease biomarkers, given their ability to produce generalizable and stable results to both data and functional perturbations. Finally, we highlight critical issues that need to be considered in the design of such strategies.

show abstract

Section: Introductionmentioning

confidence: 99%