This commentary reviews a specific issue related to the selection of the analytical tool used when comparing the estimated performance of systems under the receiver operating characteristic (ROC) paradigm. This issue is related to the possible impact of the last experimentally ascertained ROC data point in terms of highest true positive and false positive fractions. An example of a case where the selection of a specific analysis approach could affect the study conclusion from being nonsignificant (p=0.75) for parametric analysis and significant (p=0.003) for the non-parametric analysis is presented, followed by recommendations that should help avoid misinterpretation of the results.
KeywordsTechnology Assessment; Performance; ROC; Parametric Analysis; Non-Parametric Analysis Currently, observer performance studies are routinely performed for the assessment and comparison of technologies and practices and the area (AUC) under the receiver operating characteristic (ROC) curve is the most frequently used summary index when comparing different modalities (1-5). In the medical imaging field conclusions resulting from important pivotal studies are often made based on the assessment of differences between areas under estimated ROC curves that include substantial portions near which no experimental data lie. Hence, in most instances, it would be preferable to assess differences between partial AUCs (6,7). Unfortunately, the large variability frequently associated with these studies would necessitate extremely large sample sizes to enable demonstration of significant differences between partial AUCs, making this approach impractical in many situations. As a result, we continue to do the best we can under the circumstances, namely estimate the differences between the AUCs and assess the significance level of these differences, if any.A variety of parametric and non-parametric approaches have been developed to make statistical inferences based on AUC in experiments that are performed under the ROC paradigm (8)(9)(10)(11)(12)(13)(14)(15)(16)(17)). An implicit assumption in all of these approaches is that even if the estimated AUCs are not precise in absolute terms, the comparison of two or more AUCs on a relative scale is frequently valid regardless of the analytical approach taken to estimate the individual ROC curves being compared. All of the methods employed to date extrapolate (or more precisely