Clinical risk reclassification at 10 years

Cook, Nancy R.; Demler, Olga; Paynter, Nina P.

doi:10.1002/sim.7340

Cited by 11 publications

(8 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is asymptotically equivalent to the proportion of the explained variation, a generalization of R 2 [35, 36], and is thus related to the likelihood or change in entropy. The IDI as well as the NRI, however, can be strongly affected by the event rate [26]. As for R 2 measures for binary or survival models, the values of the IDI are typically low and difficult to interpret.…”

Section: Category-free Methodsmentioning

confidence: 99%

“…It is a proper measure of global discrimination measure that serves as a measure of distance between the distributions of risk between events and non-events. A problem is that it may not be clinically relevant [26, 27]. If a model is calibrated in the large, so that the average predicted risk equals the observed event rate, the cutoff will be at the mean predicted risk.…”

Section: Risk Reclassificationmentioning

confidence: 99%

“…The NRI( p ) based on this event rate may classify too many individuals as high risk, particularly if the outcome is rare. The low prevalence of rare events offers a unique challenge and also affects the performance of the various measures [26]. If the prevalence is as low as 1/2000, the cut point based on this would be 0.0005, equivalent to a cost tradeoff of 2000 to 1.…”

Section: Choice Of Risk Thresholdsmentioning

confidence: 99%

See 2 more Smart Citations

Quantifying the added value of new biomarkers: how and how not

2018

Self Cite

View full text Add to dashboard Cite

Over the past few decades, interest in biomarkers to enhance predictive modeling has soared. Methodology for evaluating these has also been an active area of research. There are now several performance measures available for quantifying the added value of biomarkers. This commentary provides an overview of methods currently used to evaluate new biomarkers, describes their strengths and limitations, and offers some suggestions on their use.Keywords: Biomarkers, Model fit, Calibration, Reclassification, Clinical utilityDuring the past few decades, there has been an explosion of work on the use of biomarkers in predictive modeling and whether it is useful to include these when evaluating risk of clinical events. As new biologic mechanisms have been discovered, genetic markers evolved, and new assays developed, questions about the usefulness of new markers for clinical prediction have been debated. In cardiology, several strong risk factors for cardiovascular disease, namely cholesterol levels, blood pressure, smoking, and diabetes, have been well-known for decades [1] and have been incorporated into clinical practice. They have also been included in predictive models for cardiovascular disease, primarily developed in the Framingham Heart Study [2]. Since then, many new markers with more modest effects have been discovered as new biologic pathways have been unearthed. In fields which have less powerful predictors to date, development and addition of predictive markers may be even more important.As interest in biomarkers has soared, so has the methodology used to evaluate their utility. There are now several performance measures available for quantifying the added value of biomarkers (Table 1), several of which have been proposed in the last decade. This commentary provides an overview of methods currently used to evaluate new biomarkers, describes their strengths and limitations, and offers some suggestions on their use. Likelihood functionsA fundamental construct for much of statistical modeling is the likelihood function. This reflects the probability, or "likelihood," of obtaining the observed data under the assumed model, including the selected variables and their associated parameters [3]. As more variables are added and the model fits the data better, the probability of obtaining the data that are actually observed improves. Much of statistical theory is based on this function. Thus, the primordial criterion of whether new variables, including biomarkers, can add to or improve a model is whether and by how much the likelihood increases. When the models are nested, we can test improvement with a likelihood ratio test, though other related tests, such as a Wald test, are sometimes used. For nonparametric models or machine learning tools, other loss functions are often used, such as cross-entropy or deviance, which are functions of the log likelihood for binary outcomes [4].Other likelihood-based measures do not directly perform a test of significance, but apply a penalty for added variables, such as the ...

show abstract

Section: Category-free Methodsmentioning

confidence: 99%

Section: Risk Reclassificationmentioning

confidence: 99%

Section: Choice Of Risk Thresholdsmentioning

confidence: 99%

See 1 more Smart Citation

Quantifying the added value of new biomarkers: how and how not

2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…

…”

mentioning

confidence: 99%

“…Kerr and Janes [3] view the NRI at event rate (NRI(p)) as a member of the latter class of measures focused on clinical implications, whereas we would argue that NRI(p) is more closely aligned with the global metrics. Indeed, max RU (relative utility) is a robust measure of overall discrimination, which does not depend on the event rate itself (even though this event rate is used as a classification threshold), a point highlighted and further explored in [1]. As shown in [8], it possesses numerous appealing features, including its representation as a proper measure of statistical distance between risk distributions among events and non-events.…”

mentioning

confidence: 99%

Authors' response to comments

Pencina

Chipman

Steyerberg

et al. 2017

Statistics in Medicine

View full text Add to dashboard Cite

We are grateful to the authors who provided their insightful commentaries [1][2][3][4][5], which we hope will lead to more appropriate uses of the NRI and IDI metrics and their parent measures, the maximum relative utility, and discrimination slope. Here, we highlight common themes, clarify certain issue, and point out where we differ with some of the authors.Together with the papers they reference [6][7][8], several of the authors re-iterate the importance of model calibration when using the NRI and IDI metrics. Emphasis is appropriately placed on using smooth calibration plots to assess calibration. As suggested by van Calster et al.[9] and further illustrated by the theoretical examples in [6,7], 'moderate' (or 'level 3') calibration should be required: For persons with the same predicted risk, the observed event rate equals the predicted risk. As pointed out in [1], there is room for further discussion on recalibration and how and when it should be performed. When evaluating added value of new markers, it is reasonable to assess it on a model that is calibrated in the moderate sense: if a decision is made to use the new markers, they would be incorporated into a model for which satisfactory calibration needs to be ascertained before it is recommended for general applications. At the same time, we recognize the decision-making conundrum where in some applications to new data, we may not know ahead of time the extent of potential miscalibration.A key conclusion from the work of Chipman and Braun [6] is that calibration must be sufficiently met for the population of inference. If an investigator wishes to report the IDI in a clinically relevant subgroup, sufficient calibration must be established for that subgroup. If this is not the case, Simpson's Paradox may be observed in comparing the subgroup specific IDI to the IDI for the whole sample. Even small departures from calibration may lead to Simpson's Paradox; this further emphasizes the sensitivity of the discrimination slope to model calibration [2]. Of note, these issues are not resolved by using the weighted discrimination slope or IDI. While the weighted IDI removed the issue of Simpson's Paradox in simulations, under some settings, it yielded conclusions that were inconsistent with other measures including change in AUC, Brier Score, and other R-squared metrics [6]. Therefore, while illustrative, we do not recommend using it as a new metric.An important feature of the R-squared metrics is that, unlike the AUC, they integrate both calibration and discrimination into one number. Because the AUC is equivalent to the rank-based Mann-Whitney statistic, it necessarily loses information that relates to the calibration of the risk estimates. Thus, the AUC is a more pure measure of discrimination. The discrimination slope, on the other hand, retains all the information. Yet because the slope is not a proper scoring rule, it is susceptible to misleading conclusions for poorly calibrated models [6,7].

show abstract

A model incorporating serum C3 complement levels may be useful for diagnosing biliary atresia in infants

Liang

et al. 2022

Gastroenterología y Hepatología (English Edition)

View full text Add to dashboard Cite

Clinical risk reclassification at 10 years

Cited by 11 publications

References 30 publications

Quantifying the added value of new biomarkers: how and how not

Quantifying the added value of new biomarkers: how and how not

Authors' response to comments

A model incorporating serum C3 complement levels may be useful for diagnosing biliary atresia in infants

Contact Info

Product

Resources

About