Previous articles in Statistics in Medicine describe how to calculate the sample size required for external validation of prediction models with continuous and binary outcomes. The minimum sample size criteria aim to ensure precise estimation of key measures of a model's predictive performance, including measures of calibration, discrimination, and net benefit. Here, we extend the sample size guidance to prediction models with a time-to-event (survival) outcome, to cover external validation in datasets containing censoring. A simulation-based framework is proposed, which calculates the sample size required to target a particular confidence interval width for the calibration slope measuring the agreement between predicted risks (from the model) and observed risks (derived using pseudo-observations to account for censoring) on the log cumulative hazard scale. Precise estimation of calibration curves, discrimination, and net-benefit can also be checked in this framework. The process requires assumptions about the validation population in terms of the (i) distribution of the model's linear predictor and (ii) event and censoring distributions. Existing information can inform this; in particular, the linear predictor distribution can be approximated using the C-index or Royston's D statistic from the model development article, together with the overall event risk. We demonstrate how the approach can be used to calculate the sample size required to validate a prediction model for recurrent venous thromboembolism. Ideally the sample size should ensure precise calibration across the entire range of predicted risks, but must at least ensure adequate precision in regions important for clinical decision-making. Stata and R code are provided.
HighlightsEstimates obtained from a flexible parametric model are not oversensitive to the number of knots used to create the splines.Non-proportional hazards can easily be incorporated in the model and the estimates remain non-sensitive.Flexible parametric models have advantages for obtaining useful predictions compared to other models, such as the Cox model.Online interactive graphs are a powerful tool that enable users to improve understanding of findings.
In a competing risks analysis, interest lies in the cause-specific cumulative incidence function (CIF) that can be calculated by either (1) transforming on the cause-specific hazard or (2) through its direct relationship with the subdistribution hazard. We expand on current competing risks methodology from within the flexible parametric survival modelling framework (FPM) and focus on approach (2). This models all cause-specific CIFs simultaneously and is more useful when we look to questions on prognosis. We also extend cure models using a similar approach described by Andersson et al for flexible parametric relative survival models. Using SEER public use colorectal data, we compare and contrast our approach with standard methods such as the Fine & Gray model and show that many useful out-of-sample predictions can be made after modelling the cause-specific CIFs using an FPM approach. Alternative link functions may also be incorporated such as the logit link. Models can also be easily extended for time-dependent effects.
In a competing risks analysis, interest lies in the cause-specific cumulative incidence function (CIF) which is usually obtained in a modelling framework by either (1) transforming on all of the cause-specific hazard (CSH) or (2) through its direct relationship with the subdistribution hazard (SDH) function. We expand on current competing risks methodology from within the flexible parametric survival modelling framework (FPM) and focus on approach (2). This models all cause-specific CIFs simultaneously and is more useful when prognostic related questions are to be answered. We propose the direct FPM approach for the causespecific CIF which models the (log-cumulative) baseline hazard without the requirement of numerical integration leading to benefits in computational time. It is also easy to make out-of-sample predictions to estimate more useful measures and alternative link functions can be incorporated, for example, the logit link. To implement the methods, a new estimation command, stpm2cr, is introduced and useful predictions from the model are demonstrated through an illustrative Melanoma dataset.
Background Cancer survival statistics are typically reported by using measures discounting the impact of other-cause mortality, such as net survival. This is a hypothetical measure and is interpreted as excluding the possibility of cancer patients dying from other causes. Crude probability of death partitions the all-cause probability of death into deaths from cancer and other causes. Methods The National Cancer Registration and Analysis Service is the single cancer registry for England. In 2006–2015, 1,590,477 malignant tumours were diagnosed for breast, colorectal, lung, melanoma and prostate cancer in adults. We used a relative survival framework, with a period approach, providing estimates for up to 10-year survival. Mortality was partitioned into deaths due to cancer or other causes. Unconditional and conditional (on surviving 1-years and 5-years) crude probability of death were estimated for the five cancers. Results Elderly patients who survived for a longer period before dying were more likely to die from other causes of death (except for lung cancer). For younger patients, deaths were almost entirely due to the cancer. Conclusion There are different measures of survival, each with their own strengths and limitations. Careful choices of survival measures are needed for specific scenarios to maximise the understanding of the data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.