IMPORTANCE Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. OBJECTIVE To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms. DESIGN, SETTING, AND PARTICIPANTS In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016. MAIN OUTCOMES AND MEASUREMENTS Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated. RESULTS Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive Յ12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity. CONCLUSIONS AND RELEVANCE While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine (continued)
Objectives Body tissue composition is a long-known biomarker with high diagnostic and prognostic value not only in cardiovascular, oncological, and orthopedic diseases but also in rehabilitation medicine or drug dosage. In this study, the aim was to develop a fully automated, reproducible, and quantitative 3D volumetry of body tissue composition from standard CT examinations of the abdomen in order to be able to offer such valuable biomarkers as part of routine clinical imaging. Methods Therefore, an in-house dataset of 40 CTs for training and 10 CTs for testing were fully annotated on every fifth axial slice with five different semantic body regions: abdominal cavity, bones, muscle, subcutaneous tissue, and thoracic cavity. Multi-resolution U-Net 3D neural networks were employed for segmenting these body regions, followed by subclassifying adipose tissue and muscle using known Hounsfield unit limits. Results The Sørensen Dice scores averaged over all semantic regions was 0.9553 and the intra-class correlation coefficients for subclassified tissues were above 0.99. Conclusions Our results show that fully automated body composition analysis on routine CT imaging can provide stable biomarkers across the whole abdomen and not just on L3 slices, which is historically the reference location for analyzing body composition in the clinical routine. Key Points • Our study enables fully automated body composition analysis on routine abdomen CT scans. • The best segmentation models for semantic body region segmentation achieved an averaged Sørensen Dice score of 0.9553. • Subclassified tissue volumes achieved intra-class correlation coefficients over 0.99.
Depression is ranked as the largest contributor to global disability and is also a major reason for suicide. Still, many individuals suffering from forms of depression are not treated for various reasons. Previous studies have shown that depression also has an effect on language usage and that many depressed individuals use social media platforms or the internet in general to get information or discuss their problems. This paper addresses the early detection of depression using machine learning models based on messages on a social platform. In particular, a convolutional neural network based on different word embeddings is evaluated and compared to a classification based on user-level linguistic metadata. An ensemble of both approaches is shown to achieve state-of-the-art results in a current early detection task. Furthermore, the currently popular ERDE score as metric for early detection systems is examined in detail and its drawbacks in the context of shared tasks are illustrated. A slightly modified metric is proposed and compared to the original score. Finally, a new word embedding was trained on a large corpus of the same domain as the described task and is evaluated as well.
Objectives To reduce the dose of intravenous iodine-based contrast media (ICM) in CT through virtual contrast-enhanced images using generative adversarial networks. Methods Dual-energy CTs in the arterial phase of 85 patients were randomly split into an 80/20 train/test collective. Four different generative adversarial networks (GANs) based on image pairs, which comprised one image with virtually reduced ICM and the original full ICM CT slice, were trained, testing two input formats (2D and 2.5D) and two reduced ICM dose levels (−50% and −80%). The amount of intravenous ICM was reduced by creating virtual non-contrast series using dual-energy and adding the corresponding percentage of the iodine map. The evaluation was based on different scores (L1 loss, SSIM, PSNR, FID), which evaluate the image quality and similarity. Additionally, a visual Turing test (VTT) with three radiologists was used to assess the similarity and pathological consistency. Results The −80% models reach an SSIM of > 98%, PSNR of > 48, L1 of between 7.5 and 8, and an FID of between 1.6 and 1.7. In comparison, the −50% models reach a SSIM of > 99%, PSNR of > 51, L1 of between 6.0 and 6.1, and an FID between 0.8 and 0.95. For the crucial question of pathological consistency, only the 50% ICM reduction networks achieved 100% consistency, which is required for clinical use. Conclusions The required amount of ICM for CT can be reduced by 50% while maintaining image quality and diagnostic accuracy using GANs. Further phantom studies and animal experiments are required to confirm these initial results. Key Points • The amount of contrast media required for CT can be reduced by 50% using generative adversarial networks. • Not only the image quality but especially the pathological consistency must be evaluated to assess safety. • A too pronounced contrast media reduction could influence the pathological consistency in our collective at 80%.
(1) Background: Epi- and Paracardial Adipose Tissue (EAT, PAT) have been spotlighted as important biomarkers in cardiological assessment in recent years. Since biomarker quantification is an increasingly important method for clinical use, we wanted to examine fully automated EAT and PAT quantification for possible use in cardiovascular risk stratification. (2) Methods: 966 patients with intermediate Framingham risk scores for Coronary Artery Disease referred for coronary calcium scans were included in clinical routine retrospectively. The Coronary Artery Calcium Score (CACS) was extracted and tissue quantification was performed by a deep learning network. (3) Results: The Computed Tomography (CT) segmentations predicted by the network indicated no significant correlation between EAT volume and EAT radiodensity when compared to Agatston score (r = 0.18, r = −0.09). CACS 0 category patients showed significantly lower levels of total EAT and PAT volumes and higher EAT and PAT densities than CACS 1–99 category patients (p < 0.01). Notably, this difference did not reach significance regarding EAT attenuation in male patients. Women older than 50 years, thus more likely to be postmenopausal, were shown to be at higher risk of coronary calcification (p < 0.01, OR = 4.59). CACS 1–99 vs. CACS ≥100 category patients remained below significance level (EAT volume: p = 0.087, EAT attenuation: p = 0.98). (4) Conclusions: Our study proves the feasibility of a fully automated adipose tissue analysis in clinical cardiac CT and confirms in a large clinical cohort that volume and attenuation of EAT and PAT are not correlated with CACS. Broadly available deep learning based rapid and reliable tissue quantification should thus be discussed as a method to assess this biomarker as a supplementary risk predictor in cardiac CT.
Patients with neuroendocrine tumors of gastro-entero-pancreatic origin (GEP-NET) experience changes in fat and muscle composition. Dual-energy X-ray absorptiometry (DXA) and bioelectrical impedance analysis (BIA) are currently used to analyze body composition. Changes thereof could indicate cancer progression or response to treatment. This study examines the correlation between CT-based (computed tomography) body composition analysis (BCA) and DXA or BIA measurement. 74 GEP-NET-patients received whole-body [68Ga]-DOTATOC-PET/CT, BIA, and DXA-scans. BCA was performed based on the non-contrast-enhanced, 5 mm, whole-body-CT images. BCA from CT shows a strong correlation between body fat ratio with DXA (r = 0.95, ρC = 0.83) and BIA (r = 0.92, ρC = 0.76) and between skeletal muscle ratio with BIA: r = 0.81, ρC = 0.49. The deep learning-network achieves highly accurate results (mean Sørensen-Dice-score 0.93). Using BCA on routine Positron emission tomography/CT-scans to monitor patients’ body composition in the diagnostic workflow can reduce additional exams whilst substantially amplifying measurement in slower progressing cancers such as GEP-NET.
BackgroundDetection of ossification areas of hand bones in X-ray images is an important task, e.g. as a preprocessing step in automated bone age estimation. Deep neural networks have emerged recently as de facto standard detection methods, but their drawback is the need of large annotated datasets. Finetuning pre-trained networks is a viable alternative, but it is not clear a priori if training with small annotated datasets will be successful, as it depends on the problem at hand. In this paper, we show that pre-trained networks can be utilized to produce an effective detector of ossification areas in pediatric X-ray images of hands.Methods and findingsA publicly available Faster R-CNN network, pre-trained on the COCO dataset, was utilized and finetuned with 240 manually annotated radiographs from the RSNA Pediatric Bone Age Challenge, which comprises over 14.000 pediatric radiographs. The validation is done on another 89 radiographs from the dataset and the performance is measured by Intersection-over-Union (IoU). To understand the effect of the data size on the pre-trained network, subsampling was applied to the training data and the training was repeated. Additionally, the network was trained from scratch without any pre-trained weights. Finally, to understand whether the trained model could be useful, we compared the inference of the network to an annotation of an expert radiologist. The finetuned network was able to achieve an average precision (mAP@0.5IoU) of 92.92 ± 1.93. Apart from the wrist region, all ossification areas were able to benefit from more data. In contrast, the network trained from scratch was not able to produce any correct results. When compared to the annotations of the expert radiologist, the network was able to localize the regions quite well, as the F1-Score was on average 91.85 ± 1.06.ConclusionsBy finetuning a pre-trained deep neural network, with 240 annotated radiographs, we were able to successfully detect ossification areas in prediatric hand radiographs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.