Reduced accuracy of MRI deep grey matter segmentation in multiple sclerosis: an evaluation of four automated methods against manual reference segmentations in a multi-center cohort

Sitter, Alexandra de; Verhoeven, Tom; Burggraaff, Jessica; Simões, Jorge; Ruggieri, Serena; Palotai, Miklós; Brouwer, Iman; Versteeg, Adriaan; Wottschel, Viktor; Ropele, Stefan; Rocca, M. A.; Gasperini, Claudio; Gallo, Antonio; Yiannakas, Marios C.; Enzinger, Christian; Filippi, Massimo; Stefano, Nicola De; Kappos, Ludwig; Frederiksen, Jette Lautrup; Uitdehaag, Bernard M. J.; Barkhof, Frederik; Vrenken, Hugo

doi:10.1007/s00415-020-10023-1

Cited by 14 publications

(17 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is probably due to their low contrast compared to surrounding tissue in T1-weighted MRI, which makes it more complicated to trace the edges of the thalamus in these subregions, also manually. The Bland Altman plots revealed that thalamus volumes were on average overestimated by FSL-FIRST and FreeSurfer (excepted left thalamus measurements), while they were systematically underestimated by CAT12, GIF and VolBrain, which is in line with an earlier publication on this topic ( de Sitter et al, 2020 ). It appeared that the absolute agreement for CAT12 (ICC: 0.20–0.21), GIF and VolBrain (ICCs between 0.39 and 0.47) in our study were much worse than previously reported by de Sitter et al (2020) .…”

Section: Discussionsupporting

confidence: 88%

“…The Bland Altman plots revealed that thalamus volumes were on average overestimated by FSL-FIRST and FreeSurfer (excepted left thalamus measurements), while they were systematically underestimated by CAT12, GIF and VolBrain, which is in line with an earlier publication on this topic ( de Sitter et al, 2020 ). It appeared that the absolute agreement for CAT12 (ICC: 0.20–0.21), GIF and VolBrain (ICCs between 0.39 and 0.47) in our study were much worse than previously reported by de Sitter et al (2020) . However, different study populations and combined manual segmentations created by majority voting were used in previous work.…”

Section: Discussionsupporting

confidence: 88%

“…and other physiological / pathological factors (e.g., age, sex, hydration, vascular risk factors etc.) ( Amiri et al, 2018 , de Sitter et al, 2020 , Gelineau-Morel et al, 2012 , Rocca et al, 2017a , Rocca et al, 2017b , Sastre-Garriga et al, 2020 ). Given the previously reported limitations of image analysis methods, it is important to understand how consistent and reliable the association between thalamus atrophy and cognition is when using different segmentation approaches in MS patients.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Manual and automated tissue segmentation confirm the impact of thalamus atrophy on cognition in multiple sclerosis: A multicenter study

Burggraaff

Liu

Prieto

et al. 2021

NeuroImage: Clinical

Self Cite

View full text Add to dashboard Cite

Section: Discussionsupporting

confidence: 88%

Section: Discussionsupporting

confidence: 88%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Manual and automated tissue segmentation confirm the impact of thalamus atrophy on cognition in multiple sclerosis: A multicenter study

Burggraaff

Liu

Prieto

et al. 2021

NeuroImage: Clinical

Self Cite

View full text Add to dashboard Cite

“…A possible limitation of this study was that we did not compare FASTSURF with other existing automated segmentation techniques However, two other studies that were recently published by our group already evaluated existing automated segmentations methods against manual references, using (partly) the same dataset ( de Sitter et al, 2020 , Burggraaff et al, 2020 ). Moreover, since this comparison would reveal any systematic difference between methods, e.g.…”

Section: Discussionmentioning

confidence: 99%

“…Current state-of-the-art and frequently used automated segmentation methods suffer from substantial limitations with respect to both reproducibility and accuracy, which is partly due to the presence of MS pathological changes. ( Popescu et al, 2014 , Popescu et al, 2016 , Gelineau-Morel et al, 2012 , Meijerman et al, 2018 , Amiri et al, 2018 , de Sitter et al, 2020 ) Specifically, there are various confounds that can affect the measurement of dGM atrophy: image registration and segmentation can be negatively affected by the presence of white matter lesions, ( Gelineau-Morel et al, 2012 , de Sitter et al, 2020 ) generalized or local atrophy, or subtle tissue contrast changes ( Amiri et al, 2018 , Westlye et al, 2009 ). To achieve accurate automated dGM segmentation in the presence of MS abnormalities, it is important that new methods are validated against expert reference outlines of dGM in representative MS samples.…”

Section: Introductionmentioning

confidence: 99%

Development and evaluation of a manual segmentation protocol for deep grey matter in multiple sclerosis: Towards accelerated semi-automated references

Sitter

Burggraaff

Bartel

et al. 2021

NeuroImage: Clinical

Self Cite

View full text Add to dashboard Cite

Background Deep grey matter (dGM) structures, particularly the thalamus, are clinically relevant in multiple sclerosis (MS). However, segmentation of dGM in MS is challenging; labeled MS-specific reference sets are needed for objective evaluation and training of new methods. Objectives This study aimed to (i) create a standardized protocol for manual delineations of dGM; (ii) evaluate the reliability of the protocol with multiple raters; and (iii) evaluate the accuracy of a fast-semi-automated segmentation approach (FASTSURF). Methods A standardized manual segmentation protocol for caudate nucleus, putamen, and thalamus was created, and applied by three raters on multi-center 3D T1-weighted MRI scans of 23 MS patients and 12 controls. Intra- and inter-rater agreement was assessed through intra-class correlation coefficient (ICC); spatial overlap through Jaccard Index (JI) and generalized conformity index (CIgen). From sparse delineations, FASTSURF reconstructed full segmentations; accuracy was assessed both volumetrically and spatially. Results All structures showed excellent agreement on expert manual outlines: intra-rater JI > 0.83; inter-rater ICC ≥ 0.76 and CIgen ≥ 0.74. FASTSURF reproduced manual references excellently, with ICC ≥ 0.97 and JI ≥ 0.92. Conclusions The manual dGM segmentation protocol showed excellent reproducibility within and between raters. Moreover, combined with FASTSURF a reliable reference set of dGM segmentations can be produced with lower workload.

show abstract

Technical and clinical validation of commercial automated volumetric MRI tools for dementia diagnosis—a systematic review

et al. 2021

Self Cite

View full text Add to dashboard Cite

Developments in neuroradiological MRI analysis offer promise in enhancing objectivity and consistency in dementia diagnosis through the use of quantitative volumetric reporting tools (QReports). Translation into clinical settings should follow a structured framework of development, including technical and clinical validation steps. However, published technical and clinical validation of the available commercial/proprietary tools is not always easy to find and pathways for successful integration into the clinical workflow are varied. The quantitative neuroradiology initiative (QNI) framework highlights six necessary steps for the development, validation and integration of quantitative tools in the clinic. In this paper, we reviewed the published evidence regarding regulatory-approved QReports for use in the memory clinic and to what extent this evidence fulfils the steps of the QNI framework. We summarize unbiased technical details of available products in order to increase the transparency of evidence and present the range of reporting tools on the market. Our intention is to assist neuroradiologists in making informed decisions regarding the adoption of these methods in the clinic. For the 17 products identified, 11 companies have published some form of technical validation on their methods, but only 4 have published clinical validation of their QReports in a dementia population. Upon systematically reviewing the published evidence for regulatory-approved QReports in dementia, we concluded that there is a significant evidence gap in the literature regarding clinical validation, workflow integration and in-use evaluation of these tools in dementia MRI diagnosis.

show abstract

Reduced accuracy of MRI deep grey matter segmentation in multiple sclerosis: an evaluation of four automated methods against manual reference segmentations in a multi-center cohort

Cited by 14 publications

References 36 publications

Manual and automated tissue segmentation confirm the impact of thalamus atrophy on cognition in multiple sclerosis: A multicenter study

Manual and automated tissue segmentation confirm the impact of thalamus atrophy on cognition in multiple sclerosis: A multicenter study

Development and evaluation of a manual segmentation protocol for deep grey matter in multiple sclerosis: Towards accelerated semi-automated references

Technical and clinical validation of commercial automated volumetric MRI tools for dementia diagnosis—a systematic review

Contact Info

Product

Resources

About