Background Deep learning offers considerable promise for medical diagnostics. We aimed to evaluate the diagnostic accuracy of deep learning algorithms versus health-care professionals in classifying diseases using medical imaging.Methods In this systematic review and meta-analysis, we searched Ovid-MEDLINE, Embase, Science Citation Index, and Conference Proceedings Citation Index for studies published from Jan 1, 2012, to June 6, 2019. Studies comparing the diagnostic performance of deep learning models and health-care professionals based on medical imaging, for any disease, were included. We excluded studies that used medical waveform data graphics material or investigated the accuracy of image segmentation rather than disease classification. We extracted binary diagnostic accuracy data and constructed contingency tables to derive the outcomes of interest: sensitivity and specificity. Studies undertaking an out-of-sample external validation were included in a meta-analysis, using a unified hierarchical model. This study is registered with PROSPERO, CRD42018091176.Findings Our search identified 31 587 studies, of which 82 (describing 147 patient cohorts) were included. 69 studies provided enough data to construct contingency tables, enabling calculation of test accuracy, with sensitivity ranging from 9•7% to 100•0% (mean 79•1%, SD 0•2) and specificity ranging from 38•9% to 100•0% (mean 88•3%, SD 0•1). An out-of-sample external validation was done in 25 studies, of which 14 made the comparison between deep learning models and health-care professionals in the same sample. Comparison of the performance between health-care professionals in these 14 studies, when restricting the analysis to the contingency table for each study reporting the highest accuracy, found a pooled sensitivity of 87•0% (95% CI 83•0-90•2) for deep learning models and 86•4% (79•9-91•0) for health-care professionals, and a pooled specificity of 92•5% (95% CI 85•1-96•4) for deep learning models and 90•5% (80•6-95•7) for health-care professionals.Interpretation Our review found the diagnostic performance of deep learning models to be equivalent to that of health-care professionals. However, a major finding of the review is that few studies presented externally validated results or compared the performance of deep learning models and health-care professionals using the same sample. Additionally, poor reporting is prevalent in deep learning studies, which limits reliable interpretation of the reported diagnostic accuracy. New reporting standards that address specific challenges of deep learning could improve future studies, enabling greater confidence in the results of future evaluations of this promising technology.
Background Deep learning has the potential to transform health care; however, substantial expertise is required to train such models. We sought to evaluate the utility of automated deep learning software to develop medical image diagnostic classifiers by health-care professionals with no coding-and no deep learning-expertise. MethodsWe used five publicly available open-source datasets: retinal fundus images (MESSIDOR); optical coherence tomography (OCT) images (Guangzhou Medical University and Shiley Eye Institute, version 3); images of skin lesions (Human Against Machine [HAM] 10000), and both paediatric and adult chest x-ray (CXR) images (Guangzhou Medical University and Shiley Eye Institute, version 3 and the National Institute of Health [NIH] dataset, respectively)to separately feed into a neural architecture search framework, hosted through Google Cloud AutoML, that automatically developed a deep learning architecture to classify common diseases. Sensitivity (recall), specificity, and positive predictive value (precision) were used to evaluate the diagnostic properties of the models. The discriminative performance was assessed using the area under the precision recall curve (AUPRC). In the case of the deep learning model developed on a subset of the HAM10000 dataset, we did external validation using the Edinburgh Dermofit Library dataset.Findings Diagnostic properties and discriminative performance from internal validations were high in the binary classification tasks (sensitivity 73•3-97•0%; specificity 67-100%; AUPRC 0•87-1•00). In the multiple classification tasks, the diagnostic properties ranged from 38% to 100% for sensitivity and from 67% to 100% for specificity. The discriminative performance in terms of AUPRC ranged from 0•57 to 1•00 in the five automated deep learning models. In an external validation using the Edinburgh Dermofit Library dataset, the automated deep learning model showed an AUPRC of 0•47, with a sensitivity of 49% and a positive predictive value of 52%.Interpretation All models, except the automated deep learning model trained on the multilabel classification task of the NIH CXR14 dataset, showed comparable discriminative performance and diagnostic properties to state-of-the-art performing deep learning algorithms. The performance in the external validation study was low. The quality of the open-access datasets (including insufficient information about patient flow and demographics) and the absence of measurement for precision, such as confidence intervals, constituted the major limitations of this study. The availability of automated deep learning platforms provide an opportunity for the medical community to enhance their understanding in model development and evaluation. Although the derivation of classification models without requiring a deep understanding of the mathematical, statistical, and programming principles is attractive, comparable performance to expertly designed models is limited to more elementary classification tasks. Furthermore, care should be placed in adhering t...
Progression to exudative 'wet' age-related macular degeneration (exAMD) is a major cause of visual deterioration. In patients diagnosed with exAMD in one eye, we introduce an artificial intelligence (AI) system to predict progression to exAMD in the second eye. By combining models based on 3D optical coherence tomography images and corresponding automatic tissue maps, our system predicts conversion to exAMD within a clinically-actionable 6-month time window, achieving a per-volumetric-scan sensitivity of 80% at 55% specificity, and 34% sensitivity at 90% specificity. This level of performance corresponds to true positives in 78% and 41% individual eyes, and false positives in 56% and 17% individual eyes, at the high sensitivity and high specificity points respectively. Moreover, we show that automatic tissue segmentation can identify anatomical changes prior to conversion and high-risk subgroups. This AI system overcomes substantial interobserver variability in expert predictions, performing better than five out of six experts, and demonstrates the potential of using AI to predict disease progression.
Recent improvements in ophthalmic imaging have led to the identification of a thickened choroid or pachychoroid to be associated with a number of retinal diseases. The number of conditions linked to this phenotype has continued to widen with specific endophenotypes found within the pachychoroid spectrum. The spectrum includes choroidal features such as focal or diffuse choroidal thickening and thinning of the overlying inner choroid, and choroidal hyperpermeability as demonstrated by indocyanine green angiography. In addition, these diseases are associated with overlying retinal pigmentary changes and retinal pigment epithelial dysfunction and may also be associated with choroidal neovascularization. This article provides a comprehensive review of the literature looking at diseases currently described within the pachychoroid spectrum including central serous chorioretinopathy, pachychoroid pigment epitheliopathy, pachychoroid neovasculopathy, polypoidal choroidal vasculopathy/aneurysmal type 1 neovascularization, peripapillary pachychoroid disease and focal choroidal excavation. We particularly focus on clinical imaging, genetics and pathological findings in these conditions with the aim of updating evidence suggesting a common aetiology between diseases within the pachychoroid spectrum.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.