In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients—manually annotated by up to four raters—and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%–85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource.
As artificial intelligence (AI) systems begin to make their way into clinical radiology practice, it is crucial to assure that they function correctly and that they gain the trust of experts. Toward this goal, approaches to make AI "interpretable" have gained attention to enhance the understanding of a machine learning algorithm, despite its complexity. This article aims to provide insights into the current state of the art of interpretability methods for radiology AI. This review discusses radiologists' opinions on the topic and suggests trends and challenges that need to be addressed to effectively streamline interpretability methods in clinical practice.
Information about the size of a tumor and its temporal evolution is needed for diagnosis as well as treatment of brain tumor patients. The aim of the study was to investigate the potential of a fully-automatic segmentation method, called BraTumIA, for longitudinal brain tumor volumetry by comparing the automatically estimated volumes with ground truth data acquired via manual segmentation. Longitudinal Magnetic Resonance (MR) Imaging data of 14 patients with newly diagnosed glioblastoma encompassing 64 MR acquisitions, ranging from preoperative up to 12 month follow-up images, was analysed. Manual segmentation was performed by two human raters. Strong correlations (R = 0.83–0.96, p < 0.001) were observed between volumetric estimates of BraTumIA and of each of the human raters for the contrast-enhancing (CET) and non-enhancing T2-hyperintense tumor compartments (NCE-T2). A quantitative analysis of the inter-rater disagreement showed that the disagreement between BraTumIA and each of the human raters was comparable to the disagreement between the human raters. In summary, BraTumIA generated volumetric trend curves of contrast-enhancing and non-enhancing T2-hyperintense tumor compartments comparable to estimates of human raters. These findings suggest the potential of automated longitudinal tumor segmentation to substitute manual volumetric follow-up of contrast-enhancing and non-enhancing T2-hyperintense tumor compartments.
Reproducible definition and quantification of imaging biomarkers is essential. We evaluated a fully automatic MR-based segmentation method by comparing it to manually defined sub-volumes by experienced radiologists in the TCGA-GBM dataset, in terms of sub-volume prognosis and association with VASARI features. MRI sets of 109 GBM patients were downloaded from the Cancer Imaging archive. GBM sub-compartments were defined manually and automatically using the Brain Tumor Image Analysis (BraTumIA). Spearman’s correlation was used to evaluate the agreement with VASARI features. Prognostic significance was assessed using the C-index. Auto-segmented sub-volumes showed moderate to high agreement with manually delineated volumes (range (r): 0.4 – 0.86). Also, the auto and manual volumes showed similar correlation with VASARI features (auto r = 0.35, 0.43 and 0.36; manual r = 0.17, 0.67, 0.41, for contrast-enhancing, necrosis and edema, respectively). The auto-segmented contrast-enhancing volume and post-contrast abnormal volume showed the highest AUC (0.66, CI: 0.55–0.77 and 0.65, CI: 0.54–0.76), comparable to manually defined volumes (0.64, CI: 0.53–0.75 and 0.63, CI: 0.52–0.74, respectively). BraTumIA and manual tumor sub-compartments showed comparable performance in terms of prognosis and correlation with VASARI features. This method can enable more reproducible definition and quantification of imaging based biomarkers and has potential in high-throughput medical imaging research.
Uncertainty estimation methods are expected to improve the understanding and quality of computer-assisted methods used in medical applications (e.g., neurosurgical interventions, radiotherapy planning), where automated medical image segmentation is crucial. In supervised machine learning, a common practice to generate ground truth label data is to merge observer annotations. However, as many medical image tasks show a high inter-observer variability resulting from factors such as image quality, different levels of user expertise and domain knowledge, little is known as to how inter-observer variability and commonly used fusion methods affect the estimation of uncertainty of automated image segmentation. In this paper we analyze the effect of common image label fusion techniques on uncertainty estimation, and propose to learn the uncertainty among observers. The results highlight the negative effect of fusion methods applied in deep learning, to obtain reliable estimates of segmentation uncertainty. Additionally, we show that the learned observers' uncertainty can be combined with current standard Monte Carlo dropout Bayesian neural networks to characterize uncertainty of model's parameters.
Background: Automated brain tumor segmentation methods are computational algorithms that yield tumor delineation from, in this case, multimodal magnetic resonance imaging (MRI). We present an automated segmentation method and its results for resection cavity (RC) in glioblastoma multiforme (GBM) patients using deep learning (DL) technologies. Methods: Post-operative, T1w with and without contrast, T2w and fluid attenuated inversion recovery MRI studies of 30 GBM patients were included. Three radiation oncologists manually delineated the RC to obtain a reference segmentation. We developed a DL cavity segmentation method, which utilizes all four MRI sequences and the reference segmentation to learn to perform RC delineations. We evaluated the segmentation method in terms of Dice coefficient (DC) and estimated volume measurements. Results: Median DC of the three radiation oncologist were 0.85 (interquartile range [IQR]: 0.08), 0.84 (IQR: 0.07), and 0.86 (IQR: 0.07). The results of the automatic segmentation compared to the three different raters were 0.83 (IQR: 0.14), 0.81 (IQR: 0.12), and 0.81 (IQR: 0.13) which was significantly lower compared to the DC among raters (chisquare = 11.63, p = 0.04). We did not detect a statistically significant difference of the measured RC volumes for the different raters and the automated method (Kruskal-Wallis test: chi-square = 1.46, p = 0.69). The main sources of error were due to signal inhomogeneity and similar intensity patterns between cavity and brain tissues. Conclusions: The proposed DL approach yields promising results for automated RC segmentation in this proof of concept study. Compared to human experts, the DC are still subpar.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.