Background Deep learning offers considerable promise for medical diagnostics. We aimed to evaluate the diagnostic accuracy of deep learning algorithms versus health-care professionals in classifying diseases using medical imaging.Methods In this systematic review and meta-analysis, we searched Ovid-MEDLINE, Embase, Science Citation Index, and Conference Proceedings Citation Index for studies published from Jan 1, 2012, to June 6, 2019. Studies comparing the diagnostic performance of deep learning models and health-care professionals based on medical imaging, for any disease, were included. We excluded studies that used medical waveform data graphics material or investigated the accuracy of image segmentation rather than disease classification. We extracted binary diagnostic accuracy data and constructed contingency tables to derive the outcomes of interest: sensitivity and specificity. Studies undertaking an out-of-sample external validation were included in a meta-analysis, using a unified hierarchical model. This study is registered with PROSPERO, CRD42018091176.Findings Our search identified 31 587 studies, of which 82 (describing 147 patient cohorts) were included. 69 studies provided enough data to construct contingency tables, enabling calculation of test accuracy, with sensitivity ranging from 9•7% to 100•0% (mean 79•1%, SD 0•2) and specificity ranging from 38•9% to 100•0% (mean 88•3%, SD 0•1). An out-of-sample external validation was done in 25 studies, of which 14 made the comparison between deep learning models and health-care professionals in the same sample. Comparison of the performance between health-care professionals in these 14 studies, when restricting the analysis to the contingency table for each study reporting the highest accuracy, found a pooled sensitivity of 87•0% (95% CI 83•0-90•2) for deep learning models and 86•4% (79•9-91•0) for health-care professionals, and a pooled specificity of 92•5% (95% CI 85•1-96•4) for deep learning models and 90•5% (80•6-95•7) for health-care professionals.Interpretation Our review found the diagnostic performance of deep learning models to be equivalent to that of health-care professionals. However, a major finding of the review is that few studies presented externally validated results or compared the performance of deep learning models and health-care professionals using the same sample. Additionally, poor reporting is prevalent in deep learning studies, which limits reliable interpretation of the reported diagnostic accuracy. New reporting standards that address specific challenges of deep learning could improve future studies, enabling greater confidence in the results of future evaluations of this promising technology.
Background Deep learning has the potential to transform health care; however, substantial expertise is required to train such models. We sought to evaluate the utility of automated deep learning software to develop medical image diagnostic classifiers by health-care professionals with no coding-and no deep learning-expertise. MethodsWe used five publicly available open-source datasets: retinal fundus images (MESSIDOR); optical coherence tomography (OCT) images (Guangzhou Medical University and Shiley Eye Institute, version 3); images of skin lesions (Human Against Machine [HAM] 10000), and both paediatric and adult chest x-ray (CXR) images (Guangzhou Medical University and Shiley Eye Institute, version 3 and the National Institute of Health [NIH] dataset, respectively)to separately feed into a neural architecture search framework, hosted through Google Cloud AutoML, that automatically developed a deep learning architecture to classify common diseases. Sensitivity (recall), specificity, and positive predictive value (precision) were used to evaluate the diagnostic properties of the models. The discriminative performance was assessed using the area under the precision recall curve (AUPRC). In the case of the deep learning model developed on a subset of the HAM10000 dataset, we did external validation using the Edinburgh Dermofit Library dataset.Findings Diagnostic properties and discriminative performance from internal validations were high in the binary classification tasks (sensitivity 73•3-97•0%; specificity 67-100%; AUPRC 0•87-1•00). In the multiple classification tasks, the diagnostic properties ranged from 38% to 100% for sensitivity and from 67% to 100% for specificity. The discriminative performance in terms of AUPRC ranged from 0•57 to 1•00 in the five automated deep learning models. In an external validation using the Edinburgh Dermofit Library dataset, the automated deep learning model showed an AUPRC of 0•47, with a sensitivity of 49% and a positive predictive value of 52%.Interpretation All models, except the automated deep learning model trained on the multilabel classification task of the NIH CXR14 dataset, showed comparable discriminative performance and diagnostic properties to state-of-the-art performing deep learning algorithms. The performance in the external validation study was low. The quality of the open-access datasets (including insufficient information about patient flow and demographics) and the absence of measurement for precision, such as confidence intervals, constituted the major limitations of this study. The availability of automated deep learning platforms provide an opportunity for the medical community to enhance their understanding in model development and evaluation. Although the derivation of classification models without requiring a deep understanding of the mathematical, statistical, and programming principles is attractive, comparable performance to expertly designed models is limited to more elementary classification tasks. Furthermore, care should be placed in adhering t...
Progression to exudative 'wet' age-related macular degeneration (exAMD) is a major cause of visual deterioration. In patients diagnosed with exAMD in one eye, we introduce an artificial intelligence (AI) system to predict progression to exAMD in the second eye. By combining models based on 3D optical coherence tomography images and corresponding automatic tissue maps, our system predicts conversion to exAMD within a clinically-actionable 6-month time window, achieving a per-volumetric-scan sensitivity of 80% at 55% specificity, and 34% sensitivity at 90% specificity. This level of performance corresponds to true positives in 78% and 41% individual eyes, and false positives in 56% and 17% individual eyes, at the high sensitivity and high specificity points respectively. Moreover, we show that automatic tissue segmentation can identify anatomical changes prior to conversion and high-risk subgroups. This AI system overcomes substantial interobserver variability in expert predictions, performing better than five out of six experts, and demonstrates the potential of using AI to predict disease progression.
Recent improvements in ophthalmic imaging have led to the identification of a thickened choroid or pachychoroid to be associated with a number of retinal diseases. The number of conditions linked to this phenotype has continued to widen with specific endophenotypes found within the pachychoroid spectrum. The spectrum includes choroidal features such as focal or diffuse choroidal thickening and thinning of the overlying inner choroid, and choroidal hyperpermeability as demonstrated by indocyanine green angiography. In addition, these diseases are associated with overlying retinal pigmentary changes and retinal pigment epithelial dysfunction and may also be associated with choroidal neovascularization. This article provides a comprehensive review of the literature looking at diseases currently described within the pachychoroid spectrum including central serous chorioretinopathy, pachychoroid pigment epitheliopathy, pachychoroid neovasculopathy, polypoidal choroidal vasculopathy/aneurysmal type 1 neovascularization, peripapillary pachychoroid disease and focal choroidal excavation. We particularly focus on clinical imaging, genetics and pathological findings in these conditions with the aim of updating evidence suggesting a common aetiology between diseases within the pachychoroid spectrum.
A number of large technology companies have created code-free cloud-based platforms that allow researchers and clinicians without coding experience to create deep learning algorithms. In this study, we comprehensively analyse the performance and featureset of six platforms, using four representative cross-sectional and en-face medical imaging datasets to create image classification models. The mean (s.d.) F1 scores across platforms for all model–dataset pairs were as follows: Amazon, 93.9 (5.4); Apple, 72.0 (13.6); Clarifai, 74.2 (7.1); Google, 92.0 (5.4); MedicMind, 90.7 (9.6); Microsoft, 88.6 (5.3). The platforms demonstrated uniformly higher classification performance with the optical coherence tomography modality. Potential use cases given proper validation include research dataset curation, mobile ‘edge models’ for regions without internet access, and baseline models against which to compare and iterate bespoke deep learning approaches.
Purpose: To apply a deep learning algorithm for automated, objective, and comprehensive quantification of OCT scans to a large real-world dataset of eyes with neovascular age-related macular degeneration (AMD) and make the raw segmentation output data openly available for further research.Design: Retrospective analysis of OCT images from the Moorfields Eye Hospital AMD Database.Participants: A total of 2473 first-treated eyes and 493 second-treated eyes that commenced therapy for neovascular AMD between June 2012 and June 2017.Methods: A deep learning algorithm was used to segment all baseline OCT scans. Volumes were calculated for segmented features such as neurosensory retina (NSR), drusen, intraretinal fluid (IRF), subretinal fluid (SRF), subretinal hyperreflective material (SHRM), retinal pigment epithelium (RPE), hyperreflective foci (HRF), fibrovascular pigment epithelium detachment (fvPED), and serous PED (sPED). Analyses included comparisons between firstand second-treated eyes by visual acuity (VA) and race/ethnicity and correlations between volumes.Main Outcome Measures: Volumes of segmented features (mm 3 ) and central subfield thickness (CST) (mm).Results: In first-treated eyes, the majority had both IRF and SRF (54.7%). First-treated eyes had greater volumes for all segmented tissues, with the exception of drusen, which was greater in second-treated eyes. In first-treated eyes, older age was associated with lower volumes for RPE, SRF, NSR, and sPED; in second-treated eyes, older age was associated with lower volumes of NSR, RPE, sPED, fvPED, and SRF. Eyes from Black individuals had higher SRF, RPE, and serous PED volumes compared with other ethnic groups. Greater volumes of the majority of features were associated with worse VA.Conclusions: We report the results of large-scale automated quantification of a novel range of baseline features in neovascular AMD. Major differences between firstand second-treated eyes, with increasing age, and between ethnicities are highlighted. In the coming years, enhanced, automated OCT segmentation may assist personalization of real-world care and the detection of novel structureefunction correlations. These data will be made publicly available for replication and future investigation by the AMD research community.
ObjectivesTo analyse treatment outcomes and share clinical data from a large, single-centre, well-curated database (8174 eyes/6664 patients with 120 756 single entries) of patients with neovascular age-related macular degeneration (AMD) treated with anti-vascular endothelial growth factor (VEGF). By making our depersonalised raw data openly available, we aim to stimulate further research in AMD, as well as set a precedent for future work in this area.SettingRetrospective, comparative, non-randomised electronic medical record (EMR) database cohort study of the UK Moorfields AMD database with data extracted between 2008 and 2018.ParticipantsIncluding one eye per patient, 3357 eyes/patients (61% female). Extraction criteria were ≥1 ranibizumab or aflibercept injection, entry of ‘AMD’ in the diagnosis field of the EMR and a minimum of 1 year of follow-up. Exclusion criteria were unknown date of first injection and treatment outside of routine clinical care at Moorfields before the first recorded injection in the database.Main outcome measuresPrimary outcome measure was change in VA at 1 and 2 years from baseline as measured in Early Treatment Diabetic Retinopathy Study letters. Secondary outcomes were the number of injections and predictive factors for VA gain.ResultsMean VA gain at 1 year and 2 years were +5.5 (95% CI 5.0 to 6.0) and +4.9 (95% CI 4.2 to 5.6) letters, respectively. Fifty-four per cent of eyes gained ≥5 letters at 2 years, 63% had stable VA (±≤14 letters), 44% of eyes maintained good VA (≥70 letters). Patients received a mean of 7.7 (95% CI 7.6 to 7.8) injections during year 1 and 13.0 (95% CI 12.8 to 13.2) injections over 2 years. Younger age, lower baseline VA and more injections were associated with higher VA gain at 2 years.ConclusionThis study benchmarks high quality EMR study results of real life AMD treatment and promotes open science in clinical AMD research by making the underlying data publicly available.
IMPORTANCEAlthough multiple imputation models for missing data and the use of mixed-effects models generally provide better outcome estimates than using only observed data or last observation carried forward in clinical trials, such approaches usually cannot be applied to visual outcomes from retrospective analyses of clinical practice settings, also called real-world outcomes.OBJECTIVE To explore the potential usefulness of survival analysis techniques for retrospective clinical practice visual outcomes. DESIGN, SETTING, AND PARTICIPANTSThis retrospective cohort study covered a 12-year observation period at a tertiary eye center. Of 10 744 eyes with neovascular age-related macular degeneration receiving anti-vascular endothelial growth factor (VEGF) therapy between October 28, 2008, and February 1, 2020, 7802 eyes met study criteria (treatment-naive, first-treated eyes starting anti-VEGF therapy). Eyes were excluded from the analysis if they received photodynamic therapy or macular laser, any previous anti-VEGF therapy, treatment with anti-VEGF agents other than ranibizumab or aflibercept, or had an unknown date or visual acuity (VA) value at first injection. MAIN OUTCOMES AND MEASURESKaplan-Meier estimates and Cox proportional hazards modeling were used to consider VA reaching an Early Treatment Diabetic Retinopathy Study (ETDRS) letter score of 70 (Snellen equivalent, 20/40) or better, duration of VA sustained at or better than 70 (20/40), and VA declining to 35 (20/200) or worse. RESULTS A total of 7802 patients (mean [SD] age, 78.7 [8.8] years; 4776 women [61.2%]; and 4785 White [61.3%]) were included in the study. The median time to attaining a VA letter score greater than or equal to 70 (20/40) was 2.0 years (95% CI, 1.87-2.32) after the first anti-VEGF injection. Predictive features were baseline VA (hazard ratio [HR], 1.43 per 5 ETDRS letter score or 1 line; 95% CI, 1.40-1.46), baseline age (HR, 0.88 per 5 years; 95% CI, 0.86-0.90), and injection number (HR, 1.12; 95% CI, 1.10-1.15). Of the 4439 of 7802 patients (57%) attaining this outcome, median time sustained at an ETDRS letter score of 70 (20/40) or better was 1.1 years (95% CI, 1.1-1.2). CONCLUSIONS AND RELEVANCEIn this cohort study, patients with neovascular age-related macular degeneration beginning anti-VEGF therapy were more likely to experience positive visual outcomes within the first 2.0 years after treatment, typically maintaining this outcome for 1.1 years but then deteriorating to poor vision within 8.7 years. These findings demonstrate the potential usefulness of the proposed analyses. This data set, combined with the statistical approach for retrospective analyses, may provide long-term prognostic information for patients newly diagnosed with this condition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.