2023
DOI: 10.1371/journal.pone.0284150
|View full text |Cite
|
Sign up to set email alerts
|

Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques

Abstract: With the COVID-19 pandemic having caused unprecedented numbers of infections and deaths, large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML). In particular, feature selection techniques (FS), designed to reduce the dimensiona… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 88 publications
(141 reference statements)
0
1
0
Order By: Relevance
“…The genetic algorithm has proven effective in several contexts, including the work of Kabir et al, 20 which introduced a redundancy reduction approach. In our study and in that of Hayet et al, 29 the genetic algorithms provided remarkable results, although the choice of appropriate objective functions could have improved the performance. Focusing on the detection of COVID-19, the study by Hayet et al, 29 highlights the importance of specific variables, such as CRP, Respiratory Rate, Oxygen Saturation, and LDH.…”
Section: Variable Reductionsupporting
confidence: 58%
See 1 more Smart Citation
“…The genetic algorithm has proven effective in several contexts, including the work of Kabir et al, 20 which introduced a redundancy reduction approach. In our study and in that of Hayet et al, 29 the genetic algorithms provided remarkable results, although the choice of appropriate objective functions could have improved the performance. Focusing on the detection of COVID-19, the study by Hayet et al, 29 highlights the importance of specific variables, such as CRP, Respiratory Rate, Oxygen Saturation, and LDH.…”
Section: Variable Reductionsupporting
confidence: 58%
“…In this work, we sought to use diverse algorithms inspired by evolution, physical mechanisms and collective intelligence, as mentioned in the work of Agrawa et al 28 It should be noted that the algorithms applied were quite simple, trying to preserve the minimum version of each one to encourage equality of conditions, so they cannot be directly comparable with those of other works that present cumulative improvements said algorithms. 20,29,21,22,23 Relevant differences with said works will be discussed below.…”
Section: Variable Reductionmentioning
confidence: 99%
“…Table 3 covers the main clinical characterization of our clustering results per explanatory variable; focused on those 18 / 92 variables with statistically large inter-phenotype effect sizes. This subset of 18 key variables can also be viewed as a data-driven selection of the most informative factors for predicting the clinical outcomes under study (severity, mortality) [24,29]. In particular, phenotype C (low prevalence, 9.0%) included older patients, with more comorbidities, worse respiratory status (peripheral oxygenation, as well as in the arterial blood gas tests), and more unfavourable inflammatory, renal and/or hematologic biomarkers (C-reactive protein, procalcitonin, D-dimer, neutrophils-to-lymphocyte ratio, creatinine, BUN, prothrombin, etc.)…”
Section: Discussionmentioning
confidence: 99%
“…From the demographic and clinical information collected at the baseline time of hospitalization, 92 explanatory variables met our criterion of <60% missingness: whereas other 14 variables -e.g. ferritin, bilirubin, albumin, troponin, interleukin-6 (IL-6), aspartate aminotransferase (AST), creatine phosphokinase (CPK), platelets or eosinophils-failed to match this data quality criterion (see [24] for further details). Once categorical variables were transformed via one-hot encoding, these 92 attributes became d=109 features.…”
Section: Cohortmentioning
confidence: 99%
“…This process removes irrelevant and redundant features, while the most discriminative ones are kept, allowing for training predictive models with higher discrimination power and better performance. Feature selection is evaluated by measuring the quality of the classification obtained by the selected subset of features (Hayet-Otero et al, 2023;Dabba et al, 2021;Bommert et al, 2020;Liu et al, 2002). Similarly to feature importance, feature selection is also a model-dependent approach.…”
mentioning
confidence: 99%