A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction

Spooner, Annette; Chen, Emily; Sowmya, Arcot; Sachdev, Perminder S.; Kochan, Nicole A.; Trollor, Julian N.; Brodaty, Henry

doi:10.1038/s41598-020-77220-w

Cited by 167 publications

(173 citation statements)

References 37 publications

Supporting

Mentioning

148

Contrasting

Order By: Relevance

“…This leads to conclusions that can be completed with the results of this work, since this will give information of when that hospital admission will take place. However, machine learning-based approaches are not the best option when working with censored data, although they are specially helpful when handling high-dimensional clinical data [ 64 ]. Note that, in a context with censored data, it is not possible to apply directly machine learning classical models since they do not account for censored observations.…”

Section: Discussionmentioning

confidence: 99%

Cure models to estimate time until hospitalization due to COVID-19

2021

View full text Add to dashboard Cite

A short introduction to survival analysis and censored data is included in this paper. A thorough literature review in the field of cure models has been done. An overview on the most important and recent approaches on parametric, semiparametric and nonparametric mixture cure models is also included. The main nonparametric and semiparametric approaches were applied to a real time dataset of COVID-19 patients from the first weeks of the epidemic in Galicia (NW Spain). The aim is to model the elapsed time from diagnosis to hospital admission. The main conclusions, as well as the limitations of both the cure models and the dataset, are presented, illustrating the usefulness of cure models in this kind of studies, where the influence of age and sex on the time to hospital admission is shown.

show abstract

Section: Discussionmentioning

confidence: 99%

Cure models to estimate time until hospitalization due to COVID-19

2021

View full text Add to dashboard Cite

show abstract

“…Because of the curse of dimensionality, the high-dimensional data from gene expression profiles present challenges for the use of traditional feature selection methods, including overfitting, weak generalization ability, and high variance [ 36 ]. The relationship between the samples and features of the cancer datasets is formulated by the following matrix:

where x m is defined as the m link of the characteristic vector, and y m describes the column vector representing the sample categories.…”

Section: Methodsmentioning

confidence: 99%

“…Integrative Feature Selection Scheme (FRL) for Identifying Multiple Genomic Biomarkers. Because of the curse of dimensionality, the high-dimensional data from gene expression profiles present challenges for the use of traditional feature selection methods, including overfitting, weak generalization ability, and high variance [36]. The relationship between the samples and features of the cancer datasets is formulated by the following matrix:…”

mentioning

confidence: 99%

FRL: An Integrative Feature Selection Algorithm Based on the Fisher Score, Recursive Feature Elimination, and Logistic Regression to Identify Potential Genomic Biomarkers

Luo

Zhang

et al. 2021

BioMed Research International

View full text Add to dashboard Cite

Accurate screening on cancer biomarkers contributes to health assessment, drug screening, and targeted therapy for precision medicine. The rapid development of high-throughput sequencing technology has identified abundant genomic biomarkers, but most of them are limited to single-cancer analysis. Based on the combination of Fisher score, Recursive feature elimination, and Logistic regression (FRL), this paper proposes an integrative feature selection algorithm named FRL to explore potential cancer genomic biomarkers on cancer subsets. Fisher score is initially used to calculate the weights of genes to rapidly reduce the dimension. Recursive feature elimination and Logistic regression are then jointly employed to extract the optimal subset. Compared to the current differential expression analysis tool GEO2R based on the Limma algorithm, FRL has greater classification precision than Limma. Compared with five traditional feature selection algorithms, FRL exhibits excellent performance on accuracy (ACC) and F1-score and greatly improves computational efficiency. On high-noise datasets such as esophageal cancer, the ACC of FRL is 30% superior to the average ACC achieved with other traditional algorithms. As biomarkers found in multiple studies are more reliable and reproducible, and reveal stronger association on potential clinical value than single analysis, through literature review and spatial analyses of gene functional enrichment and functional pathways, we conduct cluster analysis on 10 diverse cancers with high mortality and form a potential biomarker module comprising 19 genes. All genes in this module can serve as potential biomarkers to provide more information on the overall oncogenesis mechanism for the detection of diverse early cancers and assist in targeted anticancer therapies for further developments in precision medicine.

show abstract

“…Without the presence of censoring, standard LR could be used. Traditionally, the Cox proportional hazard (CPH) model has been the most widely used model to analyse censored data, but the CPH model often works for small datasets and does not scale well to high dimensions and large volumes of clinical data [ 26 ].…”

Section: Modeling the Likelihood Of Clinical Outcomesmentioning

confidence: 99%

Current Trends in Readmission Prediction: An Overview of Approaches

et al. 2021

View full text Add to dashboard Cite

Hospital readmission shortly after discharge threatens the quality of patient care and leads to increased medical care costs. In the United States, hospitals with high readmission rates are subject to federal financial penalties. This concern calls for incentives for healthcare facilities to reduce their readmission rates by predicting patients who are at high risk of readmission. Conventional practices involve the use of rule-based assessment scores and traditional statistical methods, such as logistic regression, in developing risk prediction models. The recent advancements in machine learning driven by improved computing power and sophisticated algorithms have the potential to produce highly accurate predictions. However, the value of such models could be overrated. Meanwhile, the use of other flexible models that leverage simple algorithms offer great transparency in terms of feature interpretation, which is beneficial in clinical settings. This work presents an overview of the current trends in risk prediction models developed in the field of readmission. The various techniques adopted by researchers in recent years are described, and the topic of whether complex models outperform simple ones in readmission risk stratification is investigated.

show abstract

A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction

Cited by 167 publications

References 37 publications

Cure models to estimate time until hospitalization due to COVID-19

Cure models to estimate time until hospitalization due to COVID-19

FRL: An Integrative Feature Selection Algorithm Based on the Fisher Score, Recursive Feature Elimination, and Logistic Regression to Identify Potential Genomic Biomarkers

Current Trends in Readmission Prediction: An Overview of Approaches

Contact Info

Product

Resources

About