Background
Statistical prediction tools are increasingly common in contemporary medicine but there is considerable disagreement about how they should be evaluated. Three tools (Partin tables, the European Society for Urological Oncology (ESUO) criteria and the Gallina nomogram) have been proposed for the prediction of seminal vesicle invasion (SVI) in patients with clinically localized prostate cancer. We aimed to determine which of these tool, if any, should be used clinically.
Methods
The independent validation cohort consisted of 2584 patients treated surgically for clinically localized prostate cancer between 2002 and 2007 at one of four North American tertiary-care referral centers. Traditional (area-under-the-receiver-operating-characteristic-curve (AUC), calibration plots, the Brier score, sensitivity and specificity, positive and negative predictive value) and novel (risk stratification tables, the net reclassification index, decision curve analysis and predictiveness curves) statistical methods quantified the predictive abilities of the three tested models.
Results
Traditional statistical methods (receiver operating characteristic (ROC) plots and Brier scores), as well as two of the novel statistical methods (risk stratification tables and the net reclassification index) could not provide clear distinction between the SVI prediction tools. For example, receiver operating characteristic (ROC) plots and Brier scores seemed biased against the binary decision tool (ESUO criteria) and gave discordant results for the continuous predictions of the Partin tables and the Gallina nomogram. The results of the calibration plots were discordant with those of the ROC plots. Conversely, the decision curve clearly indicated that the Partin tables represent the ideal strategy for stratifying the risk of SVI.
Conclusions
Based on decision curve analysis results, surgeons should consider using the Partin tables to predict SVI. Decision curve analysis provided clinically meaningful comparisons between predictive models; other statistical methods for evaluation of prediction models gave inconsistent results that were difficult to interpret.