Three validation metrics for automated probabilistic image segmentation of brain tumours

Zou, Kelly H.; Wells, William M.; Kikinis, Ron; Warfield, Simon K.

doi:10.1002/sim.1723

Cited by 89 publications

(83 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Discussionmentioning

confidence: 99%

“…We have applied our recently developed, Simultaneous Truth and Performance Level Estimation (STAPLE) program (outlined in Appendix A.2.) (17)(18)(19), an automated expectation-maximization (EM) algorithm (20) for estimating the composite gold standard. For each pixel, a maximum likelihood estimate of the composite gold standard of tumor or background class was optimally determined over all image readers' results.…”

Section: Methodsmentioning

confidence: 99%

“…In Example 1, stratified ROC analyses were performed in each tumor case and type against the estimated composite voxel-wise gold standard using the STAPLE program (17)(18)(19) based on an EM algorithm (20). Over all thresholds γ (γ ⊆ [0,1]), the four bi-beta ROC parameters were estimated via matching moments (see details in Appendix A.4).…”

Section: Methodsmentioning

confidence: 99%

“…However, segmentor-specific quality, Q 0r and Q 1r are unknown. We have developed a software (STAPLE) to iteratively estimate the voxel-wise gold standard using an EM-algorithm (17)(18)(19)(20). This algorithm is briefly outlined as follows, with k = 1,…K iterations till convergence:…”

Section: A2 Expectation-maximization Algorithm For Estimating a Commentioning

confidence: 99%

“…For comparing two sets of segmentation results, existing validation metrics other than area under the ROC curve, for example, entropy-based mutual information (26), Jaccard (27) and Dice (28) similarity coefficient, and Hausdorff distance measure (29,30). We have already investigated such metrics in separate articles (19,31).In summary, we have conducted parametric evaluations of two types of continuous classification data using ROC analysis, with application to three clinical examples. The proposed method may be adapted to several validation tasks in radiologic research, as illustrated in our clinical examples.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Statistical validation based on parametric receiver operating characteristic analysis of continuous classification data1

Zou¹,

Warfield²,

Tempany³

et al. 2003

Academic Radiology

Self Cite

View full text Add to dashboard Cite

Rationale and Objectives-The accuracy of diagnostic test and imaging segmentation is important in clinical practice because it has a direct impact on therapeutic planning. Statistical validations of classification accuracy was conducted based on parametric receiver operating characteristic analysis, illustrated on three radiologic examples.Materials and Methods-Two parametric models were developed for diagnostic or imaging data. Example 1: A semiautomated fractional segmentation algorithm was applied to magnetic resonance imaging of nine cases of brain tumors. The tumor and background pixel data were assumed to have bi-beta distributions. Fractional segmentation was validated against an estimated composite pixelwise gold standard based on multi-reader manual segmentations. Example 2: The predictive value of 100 cases of spiral computed tomography of ureteral stone sizes, distributed as bi-normal after a nonlinear transformation, under two treatment options received. Example 3: One hundred eighty cases had prostate-specific antigen levels measured in a prospective clinical trial. Radical prostatectomy was performed in all to provide a binary gold standard of local and advanced cancer stages. Prostate-specific antigen level was transformed and modeled by bi-normal distributions. In all examples, areas under the receiver operating characteristic curves were computed. Conclusion-All clinical examples yielded fair to excellent accuracy. The validation metric area under the receiver operating characteristic curves may be generalized to evaluating the performances of several continuous classifiers related to imaging. Results-The KeywordsBrain segmentation; magnetic resonance; prostate specific antigen (PSA); genitourinary system; computed tomography; receiver operating characteristic (ROC) analysisThe accuracy of diagnostic test and imaging segmentation is important in clinical practice because it has a direct impact on therapeutic planning. Recently, continuous classification tools In contrast, traditional diagnostic tests were often based on an ordinal rating scale. For example, a five-point scale might be adopted for observer performance evaluations, where 1 = definitely normal, 2 = probably normal, 3 = probably abnormal, 4 = probably abnormal, and 5 = definitely abnormal. A discrete subjective rating method was used in a multi-modal (magnetic resonance [MR], computed tomography [CT], and ultrasound) comparative ovarian cancer technology assessment study (5,6), in one of a series of prospective multicenter Radiologic Diagnostic Oncology Group clinical trials sponsored by the funded by the National Institutes of Health in the 1990s. The advantages of the continuous diagnostic over ordinal scale are that detailed information is preserved, they are more natural with the advancements in measurement tools and computing methods, and enable more objective interpretations. Ordinal rating data will not be the focus of this article. Instead, we will evaluate the performances of continuous classifiers only.To conduct a...

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: A2 Expectation-maximization Algorithm For Estimating a Commentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

Statistical validation based on parametric receiver operating characteristic analysis of continuous classification data1

Zou¹,

Warfield²,

Tempany³

et al. 2003

Academic Radiology

Self Cite

View full text Add to dashboard Cite

show abstract

Testing for human papillomavirus in cervical cancer screening

Nishino

Tambouret

Wilbur

2011

Cancer Cytopathology

View full text Add to dashboard Cite

High-risk human papillomavirus (hrHPV) testing has become an integral component of cervical cancer screening, given that persistent infection with hrHPV was recognized as a significant risk factor for most precancers and cancers of the cervix. Particularly, testing for hrHPV types (in conjunction with cervical cytology) has been approved for primary screening in women over 30 years of age and for cost-effective triaging of equivocal cervical cytology results. HPV was a small double-stranded DNA virus that cannot be cultured in vitro; so, different types of tests have been developed to detect its presence. Various molecular techniques were available for detecting the presence and/or quantity of hrHPV. In this review, the testing options for hrHPV and its surrogates, with an emphasis on those approved by the US Food and Drug Administration (FDA), were detailed. Cancer (Cancer Cytopathol) 2011;119:219-27. V C 2011 American Cancer Society.KEY WORDS: HPV testing, human papillomavirus, cervical cancer screening.High-risk human papillomavirus (hrHPV) testing has become an integral component of cervical cancer screening. Epidemiologic studies provide strong evidence supporting hrHPV infection as a necessary stage in the development of cervical cancer.1-5 Although cervical infection with hrHPV is common, the majority of these cases are self-limited and resolve within 2 years. 6-8 In a small percentage, the infection persists and allows for the overexpression of key HPV oncoprotein-producing genes including E6 and E7. These proteins play an important role in cell cycle dysregulation and promote uncontrolled cell survival and proliferation in a multistep progression that may culminate with invasive carcinoma. 9 With the recognition that persistent infection with hrHPV is a significant risk factor and a necessary precursor for most precancers and cancers of the cervix, hrHPV testing has become an important facet of large-scale screening programs. Particularly, testing for hrHPV types (in conjunction with cervical cytology) has been approved for primary screening in women over 30 years of age and for cost-effective triaging of equivocal cervical cytology results. In this review, the testing options for hrHPV and its surrogates, with an emphasis on those approved by the US Food and Drug Administration (FDA), will be detailed.

show abstract

Quantitative evaluation of automated skull‐stripping methods applied to contemporary and legacy images: Effects of diagnosis, bias correction, and slice location

Fennema‐Notestine

Özyurt

Clark

et al. 2005

Human Brain Mapping

173

121

View full text Add to dashboard Cite

Performance of automated methods to isolate brain from nonbrain tissues in magnetic resonance (MR) structural images may be influenced by MR signal inhomogeneities, type of MR image set, regional anatomy, and age and diagnosis of subjects studied. The present study compared the performance of four methods: Brain Extraction Tool (BET; Smith [2002]: Hum Brain Mapp 17:143-155); 3dIntracranial (Ward [1999] Milwaukee: Biophysics Research Institute, Medical College of Wisconsin; in AFNI); a Hybrid Watershed algorithm (HWA, Segonne et al. [2004] Neuroimage 22:1060-1075; in FreeSurfer); and Brain Surface Extractor (BSE, Sandor and Leahy [1997] IEEE Trans Med Imag 16:41-54; Shattuck et al. [2001] Neuroimage 13:856-876) to manually stripped images. The methods were applied to uncorrected and bias-corrected datasets; Legacy and Contemporary T1-weighted image sets; and four diagnostic groups (depressed, Alzheimer's, young and elderly control). To provide a criterion for outcome assessment, two experts manually stripped six sagittal sections for each dataset in locations where brain and nonbrain tissue are difficult to distinguish. Methods were compared on Jaccard similarity coefficients, Hausdorff distances, and an Expectation-Maximization algorithm. Methods tended to perform better on contemporary datasets; bias correction did not significantly improve method performance. Mesial sections were most difficult for all methods. Although AD image sets were most difficult to strip, HWA and BSE were more robust across diagnostic groups compared with 3dIntracranial and BET. With respect to specificity, BSE tended to perform best across all groups, whereas HWA was more sensitive than other methods. The results of this study may direct users towards a method appropriate to their T1-weighted datasets and improve the efficiency of processing for large, multisite neuroimaging studies.

show abstract

Three validation metrics for automated probabilistic image segmentation of brain tumours

Cited by 89 publications

References 54 publications

Statistical validation based on parametric receiver operating characteristic analysis of continuous classification data1

Statistical validation based on parametric receiver operating characteristic analysis of continuous classification data1

Testing for human papillomavirus in cervical cancer screening

Quantitative evaluation of automated skull‐stripping methods applied to contemporary and legacy images: Effects of diagnosis, bias correction, and slice location

Contact Info

Product

Resources

About