The relatively low ICC and CI indicate that delineation variability among observers for both left and right hippocampus was large. The posterior and anterior-medial border have the largest delineation inaccuracy. The hippocampus D constraint was not violated.
PurposePrecise and reproducible hippocampus outlining is important to quantify hippocampal atrophy caused by neurodegenerative diseases and to spare the hippocampus in whole brain radiation therapy when performing prophylactic cranial irradiation or treating brain metastases. This study aimed to quantify systematic differences between methods by comparing regional volume and outline reproducibility of manual, FSL-FIRST and FreeSurfer hippocampus segmentations.Materials and methodsThis study used a dataset from ADNI (Alzheimer’s Disease Neuroimaging Initiative), including 20 healthy controls, 40 patients with mild cognitive impairment (MCI), and 20 patients with Alzheimer’s disease (AD). For each subject back-to-back (BTB) T1-weighted 3D MPRAGE images were acquired at time-point baseline (BL) and 12 months later (M12). Hippocampi segmentations of all methods were converted into triangulated meshes, regional volumes were extracted and regional Jaccard indices were computed between the hippocampi meshes of paired BTB scans to evaluate reproducibility. Regional volumes and Jaccard indices were modelled as a function of group (G), method (M), hemisphere (H), time-point (T), region (R) and interactions.ResultsFor the volume data the model selection procedure yielded the following significant main effects G, M, H, T and R and interaction effects G-R and M-R. The same model was found for the BTB scans. For all methods volumes reduces with the severity of disease.Significant fixed effects for the regional Jaccard index data were M, R and the interaction M-R. For all methods the middle region was most reproducible, independent of diagnostic group. FSL-FIRST was most and FreeSurfer least reproducible.Discussion/ConclusionA novel method to perform detailed analysis of subtle differences in hippocampus segmentation is proposed. The method showed that hippocampal segmentation reproducibility was best for FSL-FIRST and worst for Freesurfer. We also found systematic regional differences in hippocampal segmentation between different methods reinforcing the need of adopting harmonized protocols.
Objective To compare the performance of different methods for determining hippocampal atrophy rates using longitudinal MRI scans in aging and Alzheimer's disease (AD). Background Quantifying hippocampal atrophy caused by neurodegenerative diseases is important to follow the course of the disease. In dementia, the efficacy of new therapies can be partially assessed by measuring their effect on hippocampal atrophy. In radiotherapy, the quantification of radiation-induced hippocampal volume loss is of interest to quantify radiation damage. We evaluated plausibility, reproducibility and sensitivity of eight commonly used methods to determine hippocampal atrophy rates using test-retest scans. Materials and methods Manual, FSL-FIRST, FreeSurfer, multi-atlas segmentation (MALF) and non-linear registration methods (Elastix, NiftyReg, ANTs and MIRTK) were used to determine hippocampal atrophy rates on longitudinal T1-weighted MRI from the ADNI database. Appropriate parameters for the non-linear registration methods were determined using a small training dataset (N = 16) in which two-year hippocampal atrophy was measured using test-retest scans of 8 subjects with low and 8 subjects with high atrophy rates. On a larger dataset of 20 controls, 40 mild cognitive impairment (MCI) and 20 AD patients, one-year hippocampal atrophy rates were measured. A repeated measures ANOVA analysis was performed to determine differences between controls, MCI and AD patients. For each method we calculated effect sizes and the required sample sizes to detect one-year volume change between controls and MCI (N CTRL_MCI ) and between controls and AD (N CTRL_AD ). Finally, reproducibility of hippocampal atrophy rates was assessed using within-session rescans and expressed as an average distance measure D Ave , which expresses the difference in atrophy rate, averaged over all subjects. The same D Ave was used to determine the agreement between different methods. Results Except for MALF, all methods detected a significant group difference between CTRL and AD, but none could find a significant difference between the CTRL and MCI. FreeSurfer and MIRTK required the lowest sample sizes (FreeSurfer: N CTRL_MCI = 115, N CTRL_AD = 17 with D Ave = 3.26%; MIRTK: N CTRL_MCI = 97, N CTRL_AD = 11 with D Ave = 3.76%), while ANTs was most reproducible (N CTRL_MCI = 162, N CTRL_AD = 37 with D Ave = 1.06%), followed by Elastix (N CTRL_MCI = 226, N CTRL_AD = 15 with D Ave = 1.78%) and NiftyReg (N CTRL_MCI = 193, ...
Background Deep grey matter (dGM) structures, particularly the thalamus, are clinically relevant in multiple sclerosis (MS). However, segmentation of dGM in MS is challenging; labeled MS-specific reference sets are needed for objective evaluation and training of new methods. Objectives This study aimed to (i) create a standardized protocol for manual delineations of dGM; (ii) evaluate the reliability of the protocol with multiple raters; and (iii) evaluate the accuracy of a fast-semi-automated segmentation approach (FASTSURF). Methods A standardized manual segmentation protocol for caudate nucleus, putamen, and thalamus was created, and applied by three raters on multi-center 3D T1-weighted MRI scans of 23 MS patients and 12 controls. Intra- and inter-rater agreement was assessed through intra-class correlation coefficient (ICC); spatial overlap through Jaccard Index (JI) and generalized conformity index (CIgen). From sparse delineations, FASTSURF reconstructed full segmentations; accuracy was assessed both volumetrically and spatially. Results All structures showed excellent agreement on expert manual outlines: intra-rater JI > 0.83; inter-rater ICC ≥ 0.76 and CIgen ≥ 0.74. FASTSURF reproduced manual references excellently, with ICC ≥ 0.97 and JI ≥ 0.92. Conclusions The manual dGM segmentation protocol showed excellent reproducibility within and between raters. Moreover, combined with FASTSURF a reliable reference set of dGM segmentations can be produced with lower workload.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.