Inflated prediction accuracy of neuropsychiatric biomarkers caused by data leakage in feature selection

Shim, Miseon; Lee, Seung Hwan; Hwang, Han-Jeong

doi:10.1038/s41598-021-87157-3

Cited by 22 publications

(14 citation statements)

References 15 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also used 5-fold cross-validation to prevent overfitting; AUC= 1.0 corresponds to perfect discrimination, and AUC= 0.5 corresponds to random discrimination. Here, the principal component analysis was conducted within cross validation (Shim et al, 2021) to avoid the inaccurate estimation of performance of discrimination. AUC values were averaged among 20 trials to choose tested and evaluated data set in 5-fold cross-validation and their standard deviations (SD) were also derived.…”

Section: Discussionmentioning

confidence: 99%

Alteration of Neural Network Activity With Aging Focusing on Temporal Complexity and Functional Connectivity Within Electroencephalography

Ando

Nobukawa

Kikuchi

et al. 2022

Front. Aging Neurosci.

View full text Add to dashboard Cite

With the aging process, brain functions, such as attention, memory, and cognitive functions, degrade over time. In a super-aging society, the alteration of neural activity owing to aging is considered crucial for interventions for the prevention of brain dysfunction. The complexity of temporal neural fluctuations with temporal scale dependency plays an important role in optimal brain information processing, such as perception and thinking. Complexity analysis is a useful approach for detecting cortical alteration in healthy individuals, as well as in pathological conditions, such as senile psychiatric disorders, resulting in changes in neural activity interactions among a wide range of brain regions. Multi-fractal (MF) and multi-scale entropy (MSE) analyses are known methods for capturing the complexity of temporal scale dependency of neural activity in the brain. MF and MSE analyses exhibit high accuracy in detecting changes in neural activity and are superior with regard to complexity detection when compared with other methods. In addition to complex temporal fluctuations, functional connectivity reflects the integration of information of brain processes in each region, described as mutual interactions of neural activity among brain regions. Thus, we hypothesized that the complementary relationship between functional connectivity and complexity could improve the ability to detect the alteration of spatiotemporal patterns observed on electroencephalography (EEG) with respect to aging. To prove this hypothesis, this study investigated the relationship between the complexity of neural activity and functional connectivity in aging based on EEG findings. Concretely, MF and MSE analyses were performed to evaluate the temporal complexity profiles, and phase lag index analyses assessing the unique profile of functional connectivity were performed based on the EEGs conducted for young and older participants. Subsequently, these profiles were combined through machine learning. We found that the complementary relationship between complexity and functional connectivity improves the classification accuracy among aging participants. Thus, the outcome of this study could be beneficial in formulating interventions for the prevention of age-related brain dysfunction.

show abstract

Section: Discussionmentioning

confidence: 99%

Alteration of Neural Network Activity With Aging Focusing on Temporal Complexity and Functional Connectivity Within Electroencephalography

Ando

Nobukawa

Kikuchi

et al. 2022

Front. Aging Neurosci.

View full text Add to dashboard Cite

show abstract

“…We compare the brain-PAD scores of the controls in this hold-out validation set to the MDD patients in the test set. As the validation set is not involved in the development of the brain age prediction model, the risk of overfitting is effectively prevented [64]. The application of four different machine learning algorithms allows us to further validate the consistency of the patterns observed.…”

Section: Discussionmentioning

confidence: 99%

Accelerated functional brain aging in major depressive disorder: evidence from a large scale fMRI analysis of Chinese participants

Luo,

Chen,

Qiu

et al. 2022

Preprint

View full text Add to dashboard Cite

Major depressive disorder (MDD) is one of the most common mental health conditions that has been intensively investigated for its association with brain atrophy and mortality. Recent studies reveal that the deviation between the predicted and the chronological age can be a marker of accelerated brain aging to characterize MDD. However, current conclusions are usually drawn based on structural MRI information collected from Caucasian participants. The universality of this biomarker needs to be further validated by subjects with different ethnic/racial backgrounds and by different types of data. Here we make use of the REST-meta-MDD, a large scale resting-state fMRI dataset collected from multiple cohort participants in China. We develop a stacking machine learning model based on 1101 healthy controls, which estimates a subject's chronological age from fMRI with promising accuracy. The trained model is then applied to 1276 MDD patients from 24 sites. We observe that MDD patients exhibit a +4.43 years (p < 0.0001, Cohen's d = 0.35, 95% CI :1.86 − 3.91) higher brain-predicted age difference (brain-PAD) compared to controls. In the MDD subgroup, we observe a statistically significant +2.09 years (p < 0.05, Cohen's d = 0.134483) brain-PAD in antidepressant users compared to medication-free patients. The statistical relationship observed is further checked by three different machine learning algorithms. The positive brain-PAD observed in participants in China confirms the presence of accelerated brain aging in MDD patients. The utilization of functional brain connectivity for age estimation verifies existing findings from a new dimension.

show abstract

“…The organized total data comprised 1695 datasets which were used in pipeline for the 10-fold cross-validation of the four ML models considered in this study, i.e., the logistic regression model, support vector machine, random forest model, and multilayer perceptron, to generate 30-day hospital readmission predictions by identifying the nonlinear classifying characteristic relationships among different activity-based PA parameters ( Table 1 ) and actual hospital readmissions. We adopted 10-fold cross-validation (sometimes as blocked cross-validation for timeseries splits) to prevent any data leakage [ 32 , 33 , 34 ]. The ML models were trained by a combination of supervised and reinforced learning methods, and their performance was 10-fold cross-validated using the total dataset to acquire a final trained ML model with an averaged training score.…”

Section: Methodsmentioning

confidence: 99%

Machine Learning-Based 30-Day Hospital Readmission Predictions for COPD Patients Using Physical Activity Data of Daily Living with Accelerometer-Based Device

Verma

Lin

2022

Biosensors

View full text Add to dashboard Cite

Chronic obstructive pulmonary disease (COPD) is a significantly concerning disease, and is ranked highest in terms of 30-day hospital readmission. Generally, physical activity (PA) of daily living reflects the health status and is proposed as a strong indicator of 30-day hospital readmission for patients with COPD. This study attempted to predict 30-day hospital readmission by analyzing continuous PA data using machine learning (ML) methods. Data were collected from 16 patients with COPD over 3877 days, and clinical information extracted from the patients’ hospital records. Activity-based parameters were conceptualized and evaluated, and ML models were trained and validated to retrospectively analyze the PA data, identify the nonlinear classification characteristics of different risk factors, and predict hospital readmissions. Overall, this study predicted 30-day hospital readmission and prediction performance is summarized as two distinct approaches: prediction-based performance and event-based performance. In a prediction-based performance analysis, readmissions predicted with 70.35% accuracy; and in an event-based performance analysis, the total 30-day readmissions were predicted with a precision of 72.73%. PA data reflect the health status; thus, PA data can be used to predict hospital readmissions. Predicting readmissions will improve patient care, reduce the burden of medical costs burden, and can assist in staging suitable interventions, such as promoting PA, alternate treatment plans, or changes in lifestyle to prevent readmissions.

show abstract

Inflated prediction accuracy of neuropsychiatric biomarkers caused by data leakage in feature selection

Cited by 22 publications

References 15 publications

Alteration of Neural Network Activity With Aging Focusing on Temporal Complexity and Functional Connectivity Within Electroencephalography

Alteration of Neural Network Activity With Aging Focusing on Temporal Complexity and Functional Connectivity Within Electroencephalography

Accelerated functional brain aging in major depressive disorder: evidence from a large scale fMRI analysis of Chinese participants

Machine Learning-Based 30-Day Hospital Readmission Predictions for COPD Patients Using Physical Activity Data of Daily Living with Accelerometer-Based Device

Contact Info

Product

Resources

About