Background: Accurate infarct volume measurement requires manual segmentation in diffusion weighted image (DWI) which is time-consuming and prone to variability. We compared two DWI infarct segmentation programs based on deep learning and the apparent diffusion coefficient threshold (JBS-01K and RAPID DWI, respectively) in a comprehensive stroke center. Method: We included 414 patients whose DWI were evaluated using RAPID DWI and JBS-01K. We used the Bland-Altman plot to compare estimated and manually segmented infarct volumes. We compared R-squared, root mean squared error, Akaike information criterion, and log likelihood after linear regression of manually segmented infarct volumes. Results: The mean age of included patients was 70,0±12.4 years, and 60.9% were male. The median time between the last known well and a DWI was 12.4 hours. JBS-01K segmented infarct volumes were more comparable to manually segmented volumes compared to RAPID DWI. JBS-01K had a lower root mean squared error (6.9 vs. 10.8) and log likelihood (p<0.001) compared to RAPID DWI. In addition, compared to RAPID DWI, JBS-01K more correctly classified patients according to the infarct volume threshold used in endovascular treatment trials (overall accuracy 98.1% vs. 94.0%; p = 0.002). In 35 patients who received DWI prior to endovascular treatment, JBS-01K infarct volume segmentation was more closely related to manual infarct volume segmentation. Conclusion: We demonstrated that a deep learning method segmented infarct on DWI more accurately than one based on the apparent diffusion coefficient threshold.
Background: Deep learning-based artificial intelligence techniques have been developed for automatic segmentation of diffusion-weighted magnetic resonance imaging (DWI) lesions, but currently mostly using single-site training data with modest sample sizes. Objective: To explore the effects of 1) various sample sizes of multi-site vs. single-site training data, 2) domain adaptation, the utilization of target domain data to overcome the domain shift problem, where a model that performs well in the source domain proceeds to perform poorly in the target domain, and 3) data sources and features on the performance and generalizability of deep learning algorithms for the segmentation of infarct on DW images. Methods: In this nationwide multicenter study, 10,820 DWI datasets from 10 hospitals (Internal dataset) were used for the training-and-validation (Training-and-validation dataset with six progressively larger subsamples: n=217, 433, 866, 1,732, 4,330, and 8,661 sets, yielding six algorithms) and internal test (Internal test dataset: 2,159 sets without overlapping sample) of 3D U-net algorithms for automatic DWI lesion segmentation. In addition, 476 DW images from one of the 10 hospitals (Single-site dataset) were used for training-and-validation (n=382) and internal test (n=94) of another algorithm. Then, 2,777 DW images from a different hospital (External dataset) and two ancillary test datasets (I, n=50 from three different hospitals; II, n=250 from Ischemic Stroke Lesion Segmentation Challenge 2022) were used for external validation of the seven algorithms, testing each algorithm performance vs. manual segmentation gold standard using DICE scores as a figure of merit. Additional tests of the six algorithms were performed after stratification by infarct volume, infarct location, and stroke onset-to-imaging time. Domain Adaptation was performed to fine-tune the algorithms with subsamples (50, 100, 200, 500, and 1000) of the 2,777 External dataset, and its effect was tested using a) 1,777 DW images (from the External dataset, without overlapping sample) and b) 2,159 DW images from the Internal test dataset. Results: Mean age of the 8,661 patients in the Training-and-validation dataset was 67.9 years (standard deviation 12.9), and 58.9% (n = 4,431) were male. As the subsample size of the multi-site dataset was increased from 217 to 1,732, algorithm performance increased sharply, with DSC scores rising from 0.58 to 0.65. When the sample size was further increased to 4,330 and 8,661, DSC increased only slightly (to 0.68 and 0.70, respectively). Similar results were seen in external tests. Although a deep learning algorithm that was developed using the Single-site dataset achieved DSC of 0.70 (standard deviation 0.23) in internal test, it showed substantially lower performance in the three external tests, with DSC values of 0.50, 0.51, and 0.33, respectively (all p < 0.001). Stratification of the Internal test dataset and the External dataset into small (< 1.7 ml; n = 994 and 1,046, respectively), medium (1.7-14.0 ml; n = 587 and 904, respectively), and large (> 14.0; n = 446 and 825, respectively) infarct size groups, showed the best performance (DSCs up to ~0.8) in the large infarct group, lower (up to ~0.7) in the medium infarct group, and the lowest (up to ~0.6) in the small infarct group. Deep learning algorithms performed relatively poorly on brainstem infarcts or hyperacute (< 3h) infarcts. Domain adaptation, the use of a small subsample of external data to re-train the algorithm, was successful at improving algorithm performance. The algorithm trained with the 217 DW images from the Internal dataset and fine-tuned with an additional 50 DW images from the External dataset, had equivalent performance to the algorithm trained using a four-fold higher number (n=866) of DW images using the Internal dataset only (without domain adaptation). Conclusion: This study using the largest DWI data to date demonstrates that: a) multi-site data with ~1,000 DW images are required for developing a reliable infarct segmentation algorithm, b) domain adaptation could contribute to generalizability of the algorithm, and c) further investigation is required to improve the performance for segmentation of small or brainstem infarcts or hyperacute infarcts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.