A novel feature selection method via mining Markov blanket

Khan, Waqar Ali; Kong, Lingfu; Noman, Sohail M.; Brekhna, Brekhna

doi:10.1007/s10489-022-03863-z

Cited by 5 publications

(4 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our approach’s central basis was that protein sets which are crucial in distinguishing disease states may be key biological drivers of the disease ( 32 ). We developed a novel ML methodology that employs auxiliary Markov blanket feature selection ( 77, 78 ) combined with multiple recursive feature selection algorithms to mitigate bias towards any specific algorithm ( 79 ) and reduce overfitting, which is the fundamental challenge considering the inherent low sample size and high dimensionality of our, and many others, proteomics datasets. The first step of our method was the creation of Leave-One-Out (LOO) partitions of our data ( 35 ).…”

Section: Methodsmentioning

confidence: 99%

“…The suite of algorithms employed included RFE with Logistic Regression (LR) with L1 and L2 regularization penalties, respectively ( 30, 31 ), RFE with regularized Linear Discriminant Analysis (rLDA) ( 80 ), RFE with Random Forests (RF) ( 29 ), Boruta - Random Forests ( 81 ), and Maximum-Relevance-Minimum-Redundancy (MRMR) with an F-Statistic evaluator ( 82 ). Markov blanket feature selection was employed separately on the original datasets, due to computational expense and subsequently incorporated during the later aggregation steps ( 77, 78 ).…”

Section: Methodsmentioning

confidence: 99%

“…Additionally, the Boruta-RF method was integrated to effectively identify crucial features in high-dimensional datasets by comparing real features against randomly generated shadow features, optimizing feature selection under the n <<< p constraint. The Markov Blanket approach was effective for focusing on complex, non-linear variables most relevant to the target, thus efficiently reducing dimensionality ( 77 ). Finally, the MRMR method was used for balancing feature relevance and minimizing redundancy, crucial for predictive accuracy in datasets with numerous features.…”

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

Plasma Proteomics of Genetic Brain Arteriosclerosis and Dementia Syndrome Identifies Signatures of Fibrosis, Angiogenesis, and Metabolic Alterations

Keller,

Radabaugh,

Karvelas

et al. 2024

Preprint

View full text Add to dashboard Cite

Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) is the most common monogenic form of vascular cognitive impairment and dementia. A genetic arteriolosclerotic disease, the molecular mechanisms driving vascular brain degeneration and decline remain unclear. With the goal of driving discovery of disease-relevant biological perturbations in CADASIL, we used machine learning approaches to extract proteomic disease signatures from large-scale proteomics generated from plasma collected from three distinct cohorts in US and Colombia: CADASIL-Early (N = 53), CADASIL-Late (N = 45), and CADASIL-Colombia (N = 71). We extracted molecular signatures with high predictive value for early and late-stage CADASIL and performed robust cross- and external-validation. We examined the biological and clinical relevance of our findings through pathway enrichment analysis and testing of associations with clinical outcomes. Our study represents a model for unbiased discovery of molecular signatures and disease biomarkers, combining non-invasive plasma proteomics with clinical data. We report on novel disease-associated molecular signatures for CADASIL, derived from the accessible plasma proteome, with relevance to vascular cognitive impairment and dementia.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Plasma Proteomics of Genetic Brain Arteriosclerosis and Dementia Syndrome Identifies Signatures of Fibrosis, Angiogenesis, and Metabolic Alterations

Keller,

Radabaugh,

Karvelas

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…One way to achieve it is by using bootstrapping with replacement to generate the training set for developing each DT's unique feature set. However, features considered for splitting each node are not chosen from the full feature set but rather from a subset of features [45]. In addition, be aware that RF is more akin to an unintelligible black box model.…”

Section: Model Framework and Parametersmentioning

confidence: 99%

Machine Learning Techniques for Antimicrobial Resistance Prediction of Pseudomonas Aeruginosa from Whole Genome Sequence Data

Noman

Zeeshan

Arshad

et al. 2023

Computational Intelligence and Neuroscience

Self Cite

View full text Add to dashboard Cite

Aim. Due to the growing availability of genomic datasets, machine learning models have shown impressive diagnostic potential in identifying emerging and reemerging pathogens. This study aims to use machine learning techniques to develop and compare a model for predicting bacterial resistance to a panel of 12 classes of antibiotics using whole genome sequence (WGS) data of Pseudomonas aeruginosa. Method. A machine learning technique called Random Forest (RF) and BioWeka was used for classification accuracy assessment and logistic regression (LR) for statistical analysis. Results. Our results show 44.66% of isolates were resistant to twelve antimicrobial agents and 55.33% were sensitive. The mean classification accuracy was obtained ≥98% for BioWeka and ≥96 for RF on these families of antimicrobials. Where ampicillin was 99.31% and 94.00%, amoxicillin was 99.02% and 95.21%, meropenem was 98.27% and 96.63%, cefepime was 99.73% and 98.34%, fosfomycin was 96.44% and 99.23%, ceftazidime was 98.63% and 94.31%, chloramphenicol was 98.71% and 96.00%, erythromycin was 95.76% and 97.63%, tetracycline was 99.27% and 98.25%, gentamycin was 98.00% and 97.30%, butirosin was 99.57% and 98.03%, and ciprofloxacin was 96.17% and 98.97% with 10-fold-cross validation. In addition, out of twelve, eight drugs have found no false-positive and false-negative bacterial strains. Conclusion. The ability to accurately detect antibiotic resistance could help clinicians make educated decisions about empiric therapy based on the local antibiotic resistance pattern. Moreover, infection prevention may have major consequences if such prescribing practices become widespread for human health.

show abstract