ObjectivesScoring laboratory polysomnography (PSG) data remains a manual task of visually annotating 3 primary categories: sleep stages, sleep disordered breathing, and limb movements. Attempts to automate this process have been hampered by the complexity of PSG signals and physiological heterogeneity between patients. Deep neural networks, which have recently achieved expert-level performance for other complex medical tasks, are ideally suited to PSG scoring, given sufficient training data.MethodsWe used a combination of deep recurrent and convolutional neural networks (RCNN) for supervised learning of clinical labels designating sleep stages, sleep apnea events, and limb movements. The data for testing and training were derived from 10 000 clinical PSGs and 5804 research PSGs.ResultsWhen trained on the clinical dataset, the RCNN reproduces PSG diagnostic scoring for sleep staging, sleep apnea, and limb movements with accuracies of 87.6%, 88.2% and 84.7% on held-out test data, a level of performance comparable to human experts. The RCNN model performs equally well when tested on the independent research PSG database. Only small reductions in accuracy were noted when training on limited channels to mimic at-home monitoring devices: frontal leads only for sleep staging, and thoracic belt signals only for the apnea-hypopnea index.ConclusionsBy creating accurate deep learning models for sleep scoring, our work opens the path toward broader and more timely access to sleep diagnostics. Accurate scoring automation can improve the utility and efficiency of in-lab and at-home approaches to sleep diagnostics, potentially extending the reach of sleep expertise beyond specialty clinics.
The human electroencephalogram (EEG) of sleep undergoes profound changes with age. These changes can be conceptualized as "brain age", which can be compared to an age norm to reflect the deviation from normal aging process. Here, we develop an interpretable machine learning model to predict brain age based on two large sleep EEG datasets: the Massachusetts General Hospital sleep lab dataset (MGH, N = 2,621) covering age 18 to 80; and the Sleep Hearth Health Study (SHHS, N = 3,520) covering age 40 to 80. The model obtains a mean absolute deviation of 8.1 years between brain age and chronological age in the healthy participants in the MGH dataset. As validation, we analyze a subset of SHHS containing longitudinal EEGs 5 years apart, which shows a 5.5 years difference in brain age. Participants with neurological and psychiatric diseases, as well as diabetes and hypertension medications show an older brain age compared to chronological age. The findings raise the prospect of using sleep EEG as a biomarker for healthy brain aging.In total, we identify 2,621 EEGs where 189 of them have neurological or psychiatric diseases. Table 1 provides summary statistics for the dataset.
The PhysioNet/Computing in Cardiology Challenge 2018 focused on the use of various physiological signals (EEG, EOG, EMG, ECG, SaO 2) collected during polysomnographic sleep studies to detect sources of arousal (non-apnea) during sleep. A total of 1,983 polysomnographic recordings were made available to the entrants. The arousal labels for 994 of the recordings were made available in a public training set while 989 labels were retained in a hidden test set. Challengers were asked to develop an algorithm that could label the presence of arousals within the hidden test set. The performance metric used to assess entrants was the area under the precision-recall curve. A total of twenty-two independent teams entered the Challenge, deploying a variety of methods from generalized linear models to deep neural networks.
discharges (IEDs) in electroencephalograms (EEGs) are a biomarker of epilepsy, seizure risk, and clinical decline. However, there is a scarcity of experts qualified to interpret EEG results. Prior attempts to automate IED detection have been limited by small samples and have not demonstrated expert-level performance. There is a need for a validated automated method to detect IEDs with expert-level reliability.OBJECTIVE To develop and validate a computer algorithm with the ability to identify IEDs as reliably as experts and classify an EEG recording as containing IEDs vs no IEDs.DESIGN, SETTING, AND PARTICIPANTS A total of 9571 scalp EEG records with and without IEDs were used to train a deep neural network (SpikeNet) to perform IED detection. Independent training and testing data sets were generated from 13 262 IED candidates, independently annotated by 8 fellowship-trained clinical neurophysiologists, and 8520 EEG records containing no IEDs based on clinical EEG reports. Using the estimated spike probability, a classifier designating the whole EEG recording as positive or negative was also built.MAIN OUTCOMES AND MEASURES SpikeNet accuracy, sensitivity, and specificity compared with fellowship-trained neurophysiology experts for identifying IEDs and classifying EEGs as positive or negative or negative for IEDs. Statistical performance was assessed via calibration error and area under the receiver operating characteristic curve (AUC). All performance statistics were estimated using 10-fold cross-validation.RESULTS SpikeNet surpassed both expert interpretation and an industry standard commercial IED detector, based on calibration error (SpikeNet, 0.041; 95% CI, 0.033-0.049; vs industry standard, 0.066; 95% CI, 0.060-0.078; vs experts, mean, 0.183; range, 0.081-0.364) and binary classification performance based on AUC (SpikeNet, 0.980; 95% CI, 0.977-0.984; vs industry standard, 0.882; 95% CI, 0.872-0.893). Whole EEG classification had a mean calibration error of 0.126 (range, 0.109-0.1444) vs experts (mean, 0.197; range, 0.099-0.372) and AUC of 0.847 (95% CI, 0.830-0.865). CONCLUSIONS AND RELEVANCEIn this study, SpikeNet automatically detected IEDs and classified whole EEGs as IED-positive or IED-negative. This may be the first time an algorithm has been shown to exceed expert performance for IED detection in a representative sample of EEGs and may thus be a valuable tool for expedited review of EEGs.
Training with a large data set enables automated sleep staging that compares favorably with human scorers. Because testing was performed on a large and heterogeneous data set, the performance estimate has low variance and is likely to generalize broadly.
The validity of using electroencephalograms (EEGs) to diagnose epilepsy requires reliable detection of interictal epileptiform discharges (IEDs). Prior interrater reliability (IRR) studies are limited by small samples and selection bias.OBJECTIVE To assess the reliability of experts in detecting IEDs in routine EEGs. DESIGN, SETTING, AND PARTICIPANTSThis prospective analysis conducted in 2 phases included as participants physicians with at least 1 year of subspecialty training in clinical neurophysiology. In phase 1, 9 experts independently identified candidate IEDs in 991 EEGs (1 expert per EEG) reported in the medical record to contain at least 1 IED, yielding 87 636 candidate IEDs. In phase 2, the candidate IEDs were clustered into groups with distinct morphological features, yielding 12 602 clusters, and a representative candidate IED was selected from each cluster. We added 660 waveforms (11 random samples each from 60 randomly selected EEGs reported as being free of IEDs) as negative controls. Eight experts independently scored all 13 262 candidates as IEDs or non-IEDs. The 1051 EEGs in the study were recorded at the Massachusetts General Hospital between 2012 and 2016.MAIN OUTCOMES AND MEASURES Primary outcome measures were percentage of agreement (PA) and beyond-chance agreement (Gwet κ) for individual IEDs (IED-wise IRR) and for whether an EEG contained any IEDs (EEG-wise IRR). Secondary outcomes were the correlations between numbers of IEDs marked by experts across cases, calibration of expert scoring to group consensus, and receiver operating characteristic analysis of how well multivariate logistic regression models may account for differences in the IED scoring behavior between experts. RESULTS Among the 1051 EEGs assessed in the study, 540 (51.4%) were those of females and 511 (48.6%) were those of males. In phase 1, 9 experts each marked potential IEDs in a median of 65 (interquartile range [IQR], 28-332) EEGs. The total number of IED candidates marked was 87 636. Expert IRR for the 13 262 individually annotated IED candidates was fair, with the mean PA being 72.4% (95% CI, 67.0%-77.8%) and mean κ being 48.7% (95% CI, 37.3%-60.1%). The EEG-wise IRR was substantial, with the mean PA being 80.9% (95% CI, 76.2%-85.7%) and mean κ being 69.4% (95% CI, 60.3%-78.5%). A statistical model based on waveform morphological features, when provided with individualized thresholds, explained the median binary scores of all experts with a high degree of accuracy of 80% (range, 73%-88%).CONCLUSIONS AND RELEVANCE This study's findings suggest that experts can identify whether EEGs contain IEDs with substantial reliability. Lower reliability regarding individual IEDs may be largely explained by various experts applying different thresholds to a common underlying statistical model.
Background We sought to develop an automatable score to predict hospitalization, critical illness, or death for patients at risk for COVID-19 presenting for urgent care. Methods We developed the COVID-19 Acuity Score (CoVA) based on a single-center study of adult outpatients seen in respiratory illness clinics (RICs) or the emergency department (ED). Data was extracted from the Partners Enterprise Data Warehouse, and split into development (n = 9381, March 7-May 2) and prospective (n = 2205, May 3-14) cohorts. Outcomes were hospitalization, critical illness (ICU or ventilation), or death within 7 days. Calibration was assessed using the expected-to-observed event ratio (E/O). Discrimination was assessed by area under the receiver operating curve (AUC). Results In the prospective cohort, 26.1%, 6.3%, and 0.5% of patients experienced hospitalization, critical illness, or death, respectively. CoVA showed excellent performance in prospective validation for hospitalization (expected-to-observed ratio (E/O): 1.01, AUC: 0.76); for critical illness (E/O 1.03, AUC: 0.79); and for death (E/O: 1.63, AUC=0.93). Among 30 predictors, the top five were age, diastolic blood pressure, blood oxygen saturation, COVID-19 testing status, and respiratory rate. Conclusions CoVA is a prospectively validated automatable score for the outpatient setting to predict adverse events related to COVID-19 infection.
Objective There are no validated methods for predicting the timing of seizures. Using machine learning, we sought to forecast 24‐hour risk of self‐reported seizure from e‐diaries. Methods Data from 5,419 patients on http://SeizureTracker.com (including seizure count, type, and duration) were split into training (3,806 patients/1,665,215 patient‐days) and testing (1,613 patients/549,588 patient‐days) sets with no overlapping patients. An artificial intelligence (AI) program, consisting of recurrent networks followed by a multilayer perceptron (“deep learning” model), was trained to produce risk forecasts. Forecasts were made from a sliding window of 3‐month diary history for each day of each patient's diary. After training, the model parameters were held constant and the testing set was scored. A rate‐matched random (RMR) forecast was compared to the AI. Comparisons were made using the area under the receiver operating characteristic curve (AUC), a measure of binary discrimination performance, and the Brier score, a measure of forecast calibration. The Brier skill score (BSS) measured the improvement of the AI Brier score compared to the benchmark RMR Brier score. Confidence intervals (CIs) on performance statistics were obtained via bootstrapping. Results The AUC was 0.86 (95% CI = 0.85–0.88) for AI and 0.83 (95% CI = 0.81–0.85) for RMR, favoring AI (p < 0.001). Overall (all patients combined), BSS was 0.27 (95% CI = 0.23–0.31), also favoring AI (p < 0.001). Interpretation The AI produced a valid forecast superior to a chance forecaster, and provided meaningful forecasts in the majority of patients. Future studies will be needed to quantify the clinical value of these forecasts for patients. ANN NEUROL 2020;88:588–595
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers