Background Artificial intelligence (AI) systems performing at radiologist-like levels in the evaluation of digital mammography (DM) would improve breast cancer screening accuracy and efficiency. We aimed to compare the stand-alone performance of an AI system to that of radiologists in detecting breast cancer in DM. Methods Nine multi-reader, multi-case study datasets previously used for different research purposes in seven countries were collected. Each dataset consisted of DM exams acquired with systems from four different vendors, multiple radiologists’ assessments per exam, and ground truth verified by histopathological analysis or follow-up, yielding a total of 2652 exams (653 malignant) and interpretations by 101 radiologists (28 296 independent interpretations). An AI system analyzed these exams yielding a level of suspicion of cancer present between 1 and 10. The detection performance between the radiologists and the AI system was compared using a noninferiority null hypothesis at a margin of 0.05. Results The performance of the AI system was statistically noninferior to that of the average of the 101 radiologists. The AI system had a 0.840 (95% confidence interval [CI] = 0.820 to 0.860) area under the ROC curve and the average of the radiologists was 0.814 (95% CI = 0.787 to 0.841) (difference 95% CI = −0.003 to 0.055). The AI system had an AUC higher than 61.4% of the radiologists. Conclusions The evaluated AI system achieved a cancer detection accuracy comparable to an average breast radiologist in this retrospective setting. Although promising, the performance and impact of such a system in a screening setting needs further investigation.
Purpose To study the feasibility of automatically identifying normal digital mammography (DM) exams with artificial intelligence (AI) to reduce the breast cancer screening reading workload. Methods and materials A total of 2652 DM exams (653 cancer) and interpretations by 101 radiologists were gathered from nine previously performed multi-reader multi-case receiver operating characteristic (MRMC ROC) studies. An AI system was used to obtain a score between 1 and 10 for each exam, representing the likelihood of cancer present. Using all AI scores between 1 and 9 as possible thresholds, the exams were divided into groups of low- and high likelihood of cancer present. It was assumed that, under the pre-selection scenario, only the high-likelihood group would be read by radiologists, while all low-likelihood exams would be reported as normal. The area under the reader-averaged ROC curve (AUC) was calculated for the original evaluations and for the pre-selection scenarios and compared using a non-inferiority hypothesis. Results Setting the low/high-likelihood threshold at an AI score of 5 (high likelihood > 5) results in a trade-off of approximately halving (− 47%) the workload to be read by radiologists while excluding 7% of true-positive exams. Using an AI score of 2 as threshold yields a workload reduction of 17% while only excluding 1% of true-positive exams. Pre-selection did not change the average AUC of radiologists (inferior 95% CI > − 0.05) for any threshold except at the extreme AI score of 9. Conclusion It is possible to automatically pre-select exams using AI to significantly reduce the breast cancer screening reading workload. Key Points • There is potential to use artificial intelligence to automatically reduce the breast cancer screening reading workload by excluding exams with a low likelihood of cancer. • The exclusion of exams with the lowest likelihood of cancer in screening might not change radiologists’ breast cancer detection performance. • When excluding exams with the lowest likelihood of cancer, the decrease in true-positive recalls would be balanced by a simultaneous reduction in false-positive recalls.
In the present investigation, we analyze the dose of 5034 patients (20,137 images) who underwent mammographic examinations with a full-field digital mammography system. Also, we evaluate the system calibration by analyzing the exposure factors as a function of breast thickness. The information relevant to this study has been extracted from the image DICOM header and stored in a database during a 3-year period (March 2001-October 2003). Patient data included age, breast thickness, kVp, mAs, target/filter combination, and nominal dose values. Entrance surface air kerma (ESAK) without backscatter was calculated from the tube output as measured for each voltage used under clinical conditions and from the tube loading (mAs) included in the DICOM header. Mean values for the patient age and compressed breast thickness were 56 years (SD: 11) and 52 mm (SD: 13), respectively. The majority of the images was acquired using the STD (for standard) automatic mode (98%). The most frequent target/filter combination automatically selected for breast smaller than 35 mm was Mo/Mo (75%); for intermediate thicknesses between 35 and 65 mm, the combinations were Mo/Rh (54%) and Rh/Rh (38.5%); Rh/Rh was the combination selected for 91% of the cases for breasts thicker than 65 mm. A wide kVp range was observed for each target/filter combination. The most frequent values were 28 kVp for Mo/Mo, 29 kVp for Mo/Rh, and 29 and 30 kV for Rh/Rh. Exposure times ranged from 0.2 to 4.2 s with a mean value of 1.1 s. Average glandular doses (AGD) per exposure were calculated by multiplying the ESAK values by the conversion factors tabulated by Dance for women in the age groups 50 to 64 and 40 to 49. This approach is based on the dependence of breast glandularity on breast thickness and age. The total mean average glandular dose (AGD(T)) was calculated by summing the values associated with the pre-exposure and with the main exposure. Mean AGD(T) per exposure was 1.88 mGy (CI 0.01) and the mean AGD(T) per examination was 3.8 mGy, with 4 images per examination on average. The mean dose for cranio-caudal view (CC) images was 1.8 mGy, which is lower than that for medio-lateral oblique (MLO) view because the thickness for CC images was on average 10% lower than that for MLO images. Mean AGD(T) for the oldest group of women (1.90) was 3% higher than the AGD(T) for the younger group (1.85) due to the larger compressed breast thickness of women in the older group (10% on average). Differences between the corresponding AGD(T) values of each age group were lowest for breast thicknesses in the range 40-60 mm, being slightly higher for the women in the older group.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.