Background Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption in low-to-middle-income countries. This study investigated the use of artificial intelligence for fetal ultrasound in under-resourced settings. Methods Blind sweep ultrasounds, consisting of six freehand ultrasound sweeps, were collected by sonographers in the USA and Zambia, and novice operators in Zambia. We developed artificial intelligence (AI) models that used blind sweeps to predict gestational age (GA) and fetal malpresentation. AI GA estimates and standard fetal biometry estimates were compared to a previously established ground truth, and evaluated for difference in absolute error. Fetal malpresentation (non-cephalic vs cephalic) was compared to sonographer assessment. On-device AI model run-times were benchmarked on Android mobile phones. Results Here we show that GA estimation accuracy of the AI model is non-inferior to standard fetal biometry estimates (error difference −1.4 ± 4.5 days, 95% CI −1.8, −0.9, n = 406). Non-inferiority is maintained when blind sweeps are acquired by novice operators performing only two of six sweep motion types. Fetal malpresentation AUC-ROC is 0.977 (95% CI, 0.949, 1.00, n = 613), sonographers and novices have similar AUC-ROC. Software run-times on mobile phones for both diagnostic models are less than 3 s after completion of a sweep. Conclusions The gestational age model is non-inferior to the clinical standard and the fetal malpresentation model has high AUC-ROCs across operators and devices. Our AI models are able to run on-device, without internet connectivity, and provide feedback scores to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings.
To assist with the replication of our results, we have expanded the Supplementary Methods of our Article to provide more detail on how our deep learning system was trained. This includes additional optimization hyperparameters, as well as a more exhaustive description of the data augmentation strategy. Revised Supplementary Methods are presented in the Supplementary Information to this Addendum. Please also see the accompanying Matters Arising Comment (Haibe-Kains et al.,
ImportanceFetal ultrasonography is essential for confirmation of gestational age (GA), and accurate GA assessment is important for providing appropriate care throughout pregnancy and for identifying complications, including fetal growth disorders. Derivation of GA from manual fetal biometry measurements (ie, head, abdomen, and femur) is operator dependent and time-consuming.ObjectiveTo develop artificial intelligence (AI) models to estimate GA with higher accuracy and reliability, leveraging standard biometry images and fly-to ultrasonography videos.Design, Setting, and ParticipantsTo improve GA estimates, this diagnostic study used AI to interpret standard plane ultrasonography images and fly-to ultrasonography videos, which are 5- to 10-second videos that can be automatically recorded as part of the standard of care before the still image is captured. Three AI models were developed and validated: (1) an image model using standard plane images, (2) a video model using fly-to videos, and (3) an ensemble model (combining both image and video models). The models were trained and evaluated on data from the Fetal Age Machine Learning Initiative (FAMLI) cohort, which included participants from 2 study sites at Chapel Hill, North Carolina (US), and Lusaka, Zambia. Participants were eligible to be part of this study if they received routine antenatal care at 1 of these sites, were aged 18 years or older, had a viable intrauterine singleton pregnancy, and could provide written consent. They were not eligible if they had known uterine or fetal abnormality, or had any other conditions that would make participation unsafe or complicate interpretation. Data analysis was performed from January to July 2022.Main Outcomes and MeasuresThe primary analysis outcome for GA was the mean difference in absolute error between the GA model estimate and the clinical standard estimate, with the ground truth GA extrapolated from the initial GA estimated at an initial examination.ResultsOf the total cohort of 3842 participants, data were calculated for a test set of 404 participants with a mean (SD) age of 28.8 (5.6) years at enrollment. All models were statistically superior to standard fetal biometry–based GA estimates derived from images captured by expert sonographers. The ensemble model had the lowest mean absolute error compared with the clinical standard fetal biometry (mean [SD] difference, −1.51 [3.96] days; 95% CI, −1.90 to −1.10 days). All 3 models outperformed standard biometry by a more substantial margin on fetuses that were predicted to be small for their GA.Conclusions and RelevanceThese findings suggest that AI models have the potential to empower trained operators to estimate GA with higher accuracy.
Diagnostic AI systems trained using deep learning have been shown to achieve expert-level identi cation of diseases in multiple medical imaging settings 1,2 . However, such systems are not always reliable and can fail in cases diagnosed accurately by clinicians and vice versa 3 . Mechanisms for leveraging this complementarity by learning to select optimally between discordant decisions of AIs and clinicians have remained largely unexplored in healthcare 4 , yet have the potential to achieve levels of performance that exceed that possible from either AI or clinician alone 4 .We develop a Complementarity-driven Deferral-to-Clinical Work ow (CoDoC) system that can learn to decide when to rely on a diagnostic AI model and when to defer to a clinician or their work ow. We show that our system is compatible with diagnostic AI models from multiple manufacturers, obtaining enhanced accuracy (sensitivity and/or speci city) relative to clinician-only or AI-only baselines in clinical work ows that screen for breast cancer or tuberculosis. For breast cancer, we demonstrate the rst system that exceeds the accuracy of double-reading with arbitration (the "gold standard" of care) in a large representative UK screening program, with 25% reduction in false positives despite equivalent truepositive detection, while achieving a 66% reduction in clinical workload. In two separate US datasets, CoDoC exceeds the accuracy of single-reading by board certi ed radiologists and two different standalone state-of-the-art AI systems, with generalisation of this nding in different diagnostic AI manufacturers. For TB screening with chest X-rays, CoDoC improved speci city (while maintaining sensitivity) compared to standalone AI or clinicians for 3 of 5 commercially available diagnostic AI systems (5-15% reduction in false positives). Further, we show the limits of con dence score based deferral systems for medical AI, by demonstrating that no deferral strategy could have achieved signi cant improvement on the remaining two diagnostic AI systems.Our comprehensive assessment demonstrates that the superiority of CoDoC is sustained in multiple realistic stress tests for generalisation of medical AI tools along four axes: variation in the medical imaging modality; variation in clinical settings and human experts; different clinical deferral pathways within a given modality; and different AI softwares. Further, given the simplicity of CoDoC we believe that practitioners can easily adapt it and we provide an open-source implementation to encourage widespread further research and application.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.