To develop and validate a deep learning algorithm that predicts the final diagnosis of Alzheimer disease (AD), mild cognitive impairment, or neither at fluorine 18 (18 F) fluorodeoxyglucose (FDG) PET of the brain and compare its performance to that of radiologic readers. Materials and Methods: Prospective 18 F-FDG PET brain images from the Alzheimer's Disease Neuroimaging Initiative (ADNI) (2109 imaging studies from 2005 to 2017, 1002 patients) and retrospective independent test set (40 imaging studies from 2006 to 2016, 40 patients) were collected. Final clinical diagnosis at follow-up was recorded. Convolutional neural network of InceptionV3 architecture was trained on 90% of ADNI data set and tested on the remaining 10%, as well as the independent test set, with performance compared to radiologic readers. Model was analyzed with sensitivity, specificity, receiver operating characteristic (ROC), saliency map, and t-distributed stochastic neighbor embedding. Results: The algorithm achieved area under the ROC curve of 0.98 (95% confidence interval: 0.94, 1.00) when evaluated on predicting the final clinical diagnosis of AD in the independent test set (82% specificity at 100% sensitivity), an average of 75.8 months prior to the final diagnosis, which in ROC space outperformed reader performance (57% [four of seven] sensitivity, 91% [30 of 33] specificity; P , .05). Saliency map demonstrated attention to known areas of interest but with focus on the entire brain. Conclusion: By using fluorine 18 fluorodeoxyglucose PET of the brain, a deep learning algorithm developed for early prediction of Alzheimer disease achieved 82% specificity at 100% sensitivity, an average of 75.8 months prior to the final diagnosis.
IMPORTANCE Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. OBJECTIVE To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms. DESIGN, SETTING, AND PARTICIPANTS In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016. MAIN OUTCOMES AND MEASUREMENTS Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated. RESULTS Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive Յ12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity. CONCLUSIONS AND RELEVANCE While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine (continued)
Purpose To develop deep learning (DL) models to predict best-corrected visual acuity (BCVA) from optical coherence tomography (OCT) images from patients with neovascular age-related macular degeneration (nAMD). Methods Retrospective analysis of OCT images and associated BCVA measurements from the phase 3 HARBOR trial (NCT00891735). DL regression models were developed to predict BCVA at the concurrent visit and 12 months from baseline using OCT images. Binary classification models were developed to predict BCVA of Snellen equivalent of <20/40, <20/60, and ≤20/200 at the concurrent visit and 12 months from baseline. Results The regression model to predict BCVA at the concurrent visit had R 2 = 0.67 (root-mean-square error [RMSE] = 8.60) in study eyes and R 2 = 0.84 (RMSE = 9.01) in fellow eyes. The best classification model to predict BCVA at the concurrent visit had an area under the receiver operating characteristic curve (AUC) of 0.92 in study eyes and 0.98 in fellow eyes. The regression model to predict BCVA at month 12 using baseline OCT had R 2 = 0.33 (RMSE = 14.16) in study eyes and R 2 = 0.75 (RMSE = 11.27) in fellow eyes. The best classification model to predict BCVA at month 12 had AUC = 0.84 in study eyes and AUC = 0.96 in fellow eyes. Conclusions DL shows promise in predicting BCVA from OCTs in nAMD. Further research should elucidate the utility of models in clinical settings. Translational Relevance DL models predicting BCVA could be used to enhance understanding of structure–function relationships and develop more efficient clinical trials.
Purpose To examine deep learning (DL)–based methods for accurate segmentation of geographic atrophy (GA) lesions using fundus autofluorescence (FAF) and near-infrared (NIR) images. Methods This retrospective analysis utilized imaging data from study eyes of patients enrolled in Proxima A and B (NCT02479386; NCT02399072) natural history studies of GA. Two multimodal DL networks (UNet and YNet) were used to automatically segment GA lesions on FAF; segmentation accuracy was compared with annotations by experienced graders. The training data set comprised 940 image pairs (FAF and NIR) from 183 patients in Proxima B; the test data set comprised 497 image pairs from 154 patients in Proxima A. Dice coefficient scores, Bland–Altman plots, and Pearson correlation coefficient ( r ) were used to assess performance. Results On the test set, Dice scores for the DL network to grader comparison ranged from 0.89 to 0.92 for screening visit; Dice score between graders was 0.94. GA lesion area correlations ( r ) for YNet versus grader, UNet versus grader, and between graders were 0.981, 0.959, and 0.995, respectively. Longitudinal GA lesion area enlargement correlations ( r ) for screening to 12 months ( n = 53) were lower (0.741, 0.622, and 0.890, respectively) compared with the cross-sectional results at screening. Longitudinal correlations ( r ) from screening to 6 months ( n = 77) were even lower (0.294, 0.248, and 0.686, respectively). Conclusions Multimodal DL networks to segment GA lesions can produce accurate results comparable with expert graders. Translational Relevance DL-based tools may support efficient and individualized assessment of patients with GA in clinical research and practice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.