In his commentary, Dr Vickers repeats some of his previous criticisms of the NRI and IDI measures and challenges us to find examples where NRI at event rate (NRI(p)) offers information over and above net benefit. 1 However, this challenge makes little sense. NRI(p) is a single-number statistical summary measure. Net benefit should be presented across a range of thresholds; thus one should consider a net benefit curve, which is an important and useful contribution of Drs Vickers and Elkin. 2 NRI(p) is a difference between two points on the standardized net benefit curves evaluated at event rate. A point on a curve cannot contain the same or more information than the curve itself.Instead, one might contrast the NRI(p) with the change in the area under the ROC curve (AUC), or more directly, the maximum standardized net benefit (the parent measure of NRI(p)) with the AUC. We have shown that the maximum standardized net benefit is a global measure that does not depend on the event rate itself. 3 Moreover, it is proper and cannot be "fooled" by miscalibration. In these regards, it shares the properties of the AUC. Its connection with several other statistical summary measures affords a richer interpretation. Indeed, the measure can be interpreted as the Kolmogorov-Smirnov distance between the risk distributions among events and nonevents as well as the maximum relative utility or standardized net benefit. We argue that presenting standardized net benefit curves with maximum standardized net benefit as a companion single-number summary offers a more elegant and interpretable pairing than standardized net benefit and the AUC. Moreover, Baker 4 shows that the inverse of the NRI at event rate times the event rate can be interpreted as the summary test trade-off, ie, an approximate lower bound over all thresholds for the minimum number of tests for a new marker that needs to be traded for a true positive to yield a positive net benefit. 5 We agree with Dr Vickers that a thoughtful discussion about a range of classification thresholds can be useful. However, identifying any threshold, or even a threshold range, can be arbitrary, is likely to vary from person to person, and is subject to change and debate. The thresholds used in primary prevention of cardiovascular disease are a good example. When introducing the NRI in 2008, we used 6% and 20%, consistent with the practice at the time. 6 However, the 2013 American Heart Association/American College of Cardiology guidelines lowered the thresholds to 5% and 7.5%, and at the same time, expanded the definition of the outcome used in the risk prediction model. 7 The subsequent US Preventive Services Task Force guideline raised the threshold to 10%, 8 but future guidelines may lower the threshold again. The biomarker discovery process needs more grounding. Furthermore, some researchers argue that model and biomarker evaluation needs a continuous framework. That is why we need global measures of model performance. AUC, maximum standardized net benefit, and R-squared-type measures a...