We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use a very high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and find our model to be as accurate as experienced radiologists when presented with the same data. Finally, we show that a hybrid model, averaging probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To better understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, model design, training procedure, errors, and properties of its internal representations.deep learning | deep convolutional neural networks | breast cancer screening | mammography B reast cancer is the second leading cancer-related cause of death among women in the US. In 2014, over 39 million screening and diagnostic mammography exams were performed in the US. It is estimated that in 2015 232,000 women were diagnosed with breast cancer and approximately 40,000 died from it (1). Although mammography is the only imaging test that has reduced breast cancer mortality (2-4), there has been discussion regarding the potential harms of screening, including false positive recalls and associated false positive biopsies. The vast majority of the 10-15% of women asked to return following an inconclusive screening mammogram undergo another mammogram and/or ultrasound for clarification. After the additional imaging exams, many of these findings are determined as benign and only 10-20% are recommended to undergo a needle biopsy for further work-up. Among these, only 20-40% yield a diagnosis of cancer (5). Evidently, there is an unmet need to shift the balance of routine breast cancer screening towards more benefit and less harm.Traditional computer-aided detection (CAD) in mammography is routinely used by radiologists to assist with image interpretation, despite multicenter studies showing these CAD programs do not improve their diagnostic performance (6).These CAD programs typically use handcrafted features to mark sites on a mammogram that appear distinct from normal tissue structures. The radiologist decides whether to recall these findings, determining clinical significance and actionability. Recent developments in deep learning (7)-in particular, deep convolutional neural networks (CNNs) (8-12)-open possibilities for creating a new generation of CAD-like tools.This paper makes several contributions. Primarily, we train and evaluate a set of stro...
Breast cancer screening recommendations are based on risk factors. For average-risk women, screening mammography and/or digital breast tomosynthesis is recommended beginning at age 40. Ultrasound (US) may be useful as an adjunct to mammography for incremental cancer detection in women with dense breasts, but the balance between increased cancer detection and the increased risk of a false-positive examination should be considered in the decision. For intermediate-risk women, US or MRI may be indicated as an adjunct to mammography depending upon specific risk factors. For women at high risk due to prior mantle radiation between the ages of 10 to 30, mammography is recommended starting 8 years after radiation therapy but not before age 25. For women with a genetic predisposition, annual screening mammography is recommended beginning 10 years earlier than the affected relative at the time of diagnosis but not before age 30. Annual screening MRI is recommended in high-risk women as an adjunct to mammography. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision include an extensive analysis of current medical literature from peer reviewed journals and the application of well-established methodologies (RAND/UCLA Appropriateness Method and Grading of Recommendations Assessment, Development, and Evaluation or GRADE) to rate the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where evidence is lacking or equivocal, expert opinion may supplement the available evidence to recommend imaging or treatment.
Though consistently shown to detect mammographically occult cancers, breast ultrasound has been noted to have high false-positive rates. In this work, we present an AI system that achieves radiologist-level accuracy in identifying breast cancer in ultrasound images. Developed on 288,767 exams, consisting of 5,442,907 B-mode and Color Doppler images, the AI achieves an area under the receiver operating characteristic curve (AUROC) of 0.976 on a test set consisting of 44,755 exams. In a retrospective reader study, the AI achieves a higher AUROC than the average of ten board-certified breast radiologists (AUROC: 0.962 AI, 0.924 ± 0.02 radiologists). With the help of the AI, radiologists decrease their false positive rates by 37.3% and reduce requested biopsies by 27.8%, while maintaining the same level of sensitivity. This highlights the potential of AI in improving the accuracy, consistency, and efficiency of breast ultrasound diagnosis.
Appropriate imaging evaluation of nipple discharge depends the nature of the discharge. Imaging is not indicated for women with physiologic nipple discharge. For evaluation of pathologic nipple discharge, multiple breast imaging modalities are rated for evidence-based appropriateness under various scenarios. For women age 40 or older, mammography or digital breast tomosynthesis (DBT) should be the initial examination. Ultrasound is usually added as a complementary examination, with some exceptions. For women age 30 to 39, either mammogram or ultrasound may be used as the initial examination on the basis of institutional preference. For women age 30 or younger, ultrasound should be the initial examination, with mammography/DBT added when ultrasound shows suspicious findings or if the patient is predisposed to developing breast cancer. For men age 25 or older, mammography/DBT should be performed initially, with ultrasound added as indicated, given the high incidence of breast cancer in men with pathologic nipple discharge. Although MRI and ductography are not usually appropriate as initial examinations, each may be useful when the initial standard imaging evaluation is negative. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision include an extensive analysis of current medical literature from peer reviewed journals and the application of well-established methodologies (RAND/UCLA Appropriateness Method and Grading of Recommendations Assessment, Development, and Evaluation or GRADE) to rate the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where evidence is lacking or equivocal, expert opinion may supplement the available evidence to recommend imaging or treatment.
OBJECTIVE. The purpose of this article is to compare traditional versus machine learning–based computer-aided detection (CAD) platforms in breast imaging with a focus on mammography, to underscore limitations of traditional CAD, and to highlight potential solutions in new CAD systems under development for the future. CONCLUSION. CAD development for breast imaging is undergoing a paradigm shift based on vast improvement of computing power and rapid emergence of advanced deep learning algorithms, heralding new systems that may hold real potential to improve clinical care.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.