In prediction model research, external validation is needed to examine an existing model's performance using data independent to that for model development.Current external validation studies often suffer from small sample sizes and consequently imprecise predictive performance estimates. To address this, we propose how to determine the minimum sample size needed for a new external validation study of a prediction model for a binary outcome. Our calculations aim to precisely estimate calibration (Observed/Expected and calibration slope), discrimination (C-statistic), and clinical utility (net benefit). For each measure, we propose closed-form and iterative solutions for calculating the minimum sample size required. These require specifying: (i) target SEs (confidence interval widths) for each estimate of interest, (ii) the anticipated outcome event proportion in the validation population, (iii) the prediction model's anticipated (mis)calibration and variance of linear predictor values in the validation population, and (iv) potential risk thresholds for clinical decision-making. The calculations can also be used to inform whether the sample size of an existing (already collected) dataset is adequate for external validation. We illustrate our proposal for external validation of a prediction model for mechanical heart valve failure with an expected outcome event proportion of 0.018. Calculations suggest at least 9835 participants (177 events) are required to precisely estimate the calibration and discrimination measures, with this number driven by the calibration slope criterion, which we anticipate will often be the case. Also, 6443 participants (116 events) are required to precisely estimate net benefit at a risk threshold of 8%. Software code is provided.
Objectives When developing a clinical prediction model, penalization techniques are recommended to address overfitting, as they shrink predictor effect estimates toward the null and reduce mean-square prediction error in new individuals. However, shrinkage and penalty terms (‘tuning parameters’) are estimated with uncertainty from the development data set. We examined the magnitude of this uncertainty and the subsequent impact on prediction model performance. Study Design and Setting This study comprises applied examples and a simulation study of the following methods: uniform shrinkage (estimated via a closed-form solution or bootstrapping), ridge regression, the lasso, and elastic net. Results In a particular model development data set, penalization methods can be unreliable because tuning parameters are estimated with large uncertainty. This is of most concern when development data sets have a small effective sample size and the model's Cox-Snell is low. The problem can lead to considerable miscalibration of model predictions in new individuals. Conclusion Penalization methods are not a ‘carte blanche’; they do not guarantee a reliable prediction model is developed. They are more unreliable when needed most (i.e., when overfitting may be large). We recommend they are best applied with large effective sample sizes, as identified from recent sample size calculations that aim to minimize the potential for model overfitting and precisely estimate key parameters.
Purpose: Intratumoral hypoxia and immunity have been correlated with patient outcome in various tumor settings. However, these factors are not currently considered for treatment selection in head and neck cancer (HNC) due to lack of validated biomarkers. Here we sought to develop a hypoxiaimmune classifier with potential application in patient prognostication and prediction of response to targeted therapy.Experimental Design: A 54-gene hypoxia-immune signature was constructed on the basis of literature review. Gene expression was analyzed in silico using the The Cancer Genome Atlas (TCGA) HNC dataset (n ¼ 275) and validated using two independent cohorts (n ¼ 130 and 123). IHC was used to investigate the utility of a simplified protein signature. The spatial distribution of hypoxia and immune markers was examined using multiplex immunofluorescence staining.Results: Unsupervised hierarchical clustering of TCGA dataset (development cohort) identified three patient subgroups with distinct hypoxia-immune phenotypes and sur-vival profiles: hypoxia low /immune high , hypoxia high /immune low , and mixed, with 5-year overall survival (OS) rates of 71%, 51%, and 49%, respectively (P ¼ 0.0015). The prognostic relevance of the hypoxia-immune gene signature was replicated in two independent validation cohorts. Only PD-L1 and intratumoral CD3 protein expression were associated with improved OS on multivariate analysis. Hypoxia low / immune high and hypoxia high /immune low tumors were overrepresented in "inflamed" and "immune-desert" microenvironmental profiles, respectively. Multiplex staining demonstrated an inverse correlation between CA-IX expression and prevalence of intratumoral CD3 þ T cells (r ¼ À0.5464; P ¼ 0.0377), further corroborating the transcription-based classification.Conclusions: We developed and validated a hypoxiaimmune prognostic transcriptional classifier, which may have clinical application to guide the use of hypoxia modification and targeted immunotherapies for the treatment of HNC.
Objective To examine the association between antihypertensive treatment and specific adverse events. Design Systematic review and meta-analysis. Eligibility criteria Randomised controlled trials of adults receiving antihypertensives compared with placebo or no treatment, more antihypertensive drugs compared with fewer antihypertensive drugs, or higher blood pressure targets compared with lower targets. To avoid small early phase trials, studies were required to have at least 650 patient years of follow-up. Information sources Searches were conducted in Embase, Medline, CENTRAL, and the Science Citation Index databases from inception until 14 April 2020. Main outcome measures The primary outcome was falls during trial follow-up. Secondary outcomes were acute kidney injury, fractures, gout, hyperkalaemia, hypokalaemia, hypotension, and syncope. Additional outcomes related to death and major cardiovascular events were extracted. Risk of bias was assessed using the Cochrane risk of bias tool, and random effects meta-analysis was used to pool rate ratios, odds ratios, and hazard ratios across studies, allowing for between study heterogeneity (τ 2 ). Results Of 15 023 articles screened for inclusion, 58 randomised controlled trials were identified, including 280 638 participants followed up for a median of 3 (interquartile range 2-4) years. Most of the trials (n=40, 69%) had a low risk of bias. Among seven trials reporting data for falls, no evidence was found of an association with antihypertensive treatment (summary risk ratio 1.05, 95% confidence interval 0.89 to 1.24, τ 2 =0.009). Antihypertensives were associated with an increased risk of acute kidney injury (1.18, 95% confidence interval 1.01 to 1.39, τ 2 =0.037, n=15), hyperkalaemia (1.89, 1.56 to 2.30, τ 2 =0.122, n=26), hypotension (1.97, 1.67 to 2.32, τ 2 =0.132, n=35), and syncope (1.28, 1.03 to 1.59, τ 2 =0.050, n=16). The heterogeneity between studies assessing acute kidney injury and hyperkalaemia events was reduced when focusing on drugs that affect the renin angiotensin-aldosterone system. Results were robust to sensitivity analyses focusing on adverse events leading to withdrawal from each trial. Antihypertensive treatment was associated with a reduced risk of all cause mortality, cardiovascular death, and stroke, but not of myocardial infarction. Conclusions This meta-analysis found no evidence to suggest that antihypertensive treatment is associated with falls but found evidence of an association with mild (hyperkalaemia, hypotension) and severe adverse events (acute kidney injury, syncope). These data could be used to inform shared decision making between doctors and patients about initiation and continuation of antihypertensive treatment, especially in patients at high risk of harm because of previous adverse events or poor renal function. Registration PROSPERO CRD42018116860.
Introduction: Sample size "rules-of-thumb" for external validation of clinical prediction models suggest at least 100 events and 100 non-events. Such blanket guidance is imprecise, and not specific to the model or validation setting. We investigate factors affecting precision of model performance estimates upon external validation, and propose a more tailored sample size approach.Methods: Simulation of logistic regression prediction models to investigate factors associated with precision of performance estimates. Then, explanation and illustration of a simulation-based approach to calculate the minimum sample size required to precisely estimate a model's calibration, discrimination and clinical utility.Results: Precision is affected by the model's linear predictor (LP) distribution, in addition to number of events and total sample size. Sample sizes of 100 (or even 200) events and non-events can give imprecise estimates, especially for calibration. The simulationbased calculation accounts for the LP distribution and (mis)calibration in the validation sample. Application identifies 2430 required participants (531 events) for external validation of a deep vein thrombosis diagnostic model. Conclusion:Where researchers can anticipate the distribution of the model's LP (eg, based on development sample, or a pilot study), a simulation-based approach for calculating sample size for external validation offers more flexibility and reliability than rules-of-thumb.
Clinical prediction models provide individualized outcome predictions to inform patient counseling and clinical decision making. External validation is the process of examining a prediction model's performance in data independent to that used for model development. Current external validation studies often suffer from small sample sizes, and subsequently imprecise estimates of a model's predictive performance. To address this, we propose how to determine the minimum sample size needed for external validation of a clinical prediction model with a continuous outcome. Four criteria are proposed, that target precise estimates of (i) R 2 (the proportion of variance explained), (ii) calibration-in-the-large (agreement between predicted and observed outcome values on average), (iii) calibration slope (agreement between predicted and observed values across the range of predicted values), and (iv) the variance of observed outcome values. Closed-form sample size solutions are derived for each criterion, which require the user to specify anticipated values of the model's performance (in particular R 2) and the outcome variance in the external validation dataset. A sensible starting point is to base values on those for the model development study, as obtained from the publication or study authors. The largest sample size required to meet all four criteria is the recommended minimum sample size needed in the external validation dataset. The calculations can also be applied to estimate expected precision when an existing dataset with a fixed sample size is available, to help gauge if it is adequate. We illustrate the proposed methods on a case-study predicting fat-free mass in children.
Background Ventra hernias are increasing in prevalence and many recur despite attempted repair. To date, much of the literature is underpowered and divergent. As a result there is limited high quality evidence to inform surgeons succinctly which perioperative variables influence postoperative recurrence. This systematic review aimed to identify predictors of ventral hernia recurrence. Methods PubMed was searched for studies reporting prognostic data of ventral hernia recurrence between 1 January 1995 and 1 January 2018. Extracted data described hernia type (primary/incisional), definitions of recurrence, methods used to detect recurrence, duration of follow-up, and co-morbidity. Data were extracted for all potential predictors, estimates and thresholds described. Random-effects meta-analysis was used. Bias was assessed with a modified PROBAST (Prediction model Risk Of Bias ASsessment Tool). Results Screening of 18 214 abstracts yielded 274 individual studies for inclusion. Hernia recurrence was defined in 66 studies (24.1 per cent), using 41 different unstandardized definitions. Three patient variables (female sex, age 65 years or less, and BMI greater than 25, 30, 35 or 40 kg/m2), five patient co-morbidities (smoking, diabetes, chronic obstructive pulmonary disease, ASA grade III–IV, steroid use), two hernia-related variables (incisional/primary, recurrent/primary), six intraoperative variables (biological mesh, bridged repair, open versus laparoscopic surgery, suture versus mesh repair, onlay/retrorectus, intraperitoneal/retrorectus), and six postoperative variables (any complication, surgical-site occurrence, wound infection, seroma, haematoma, wound dehiscence) were identified as significant prognostic factors for hernia recurrence. Conclusion This study summarized the current evidence base for predicting ventral hernia recurrence. Results should inform best practice and future research.
Previous articles in Statistics in Medicine describe how to calculate the sample size required for external validation of prediction models with continuous and binary outcomes. The minimum sample size criteria aim to ensure precise estimation of key measures of a model's predictive performance, including measures of calibration, discrimination, and net benefit. Here, we extend the sample size guidance to prediction models with a time-to-event (survival) outcome, to cover external validation in datasets containing censoring. A simulation-based framework is proposed, which calculates the sample size required to target a particular confidence interval width for the calibration slope measuring the agreement between predicted risks (from the model) and observed risks (derived using pseudo-observations to account for censoring) on the log cumulative hazard scale. Precise estimation of calibration curves, discrimination, and net-benefit can also be checked in this framework. The process requires assumptions about the validation population in terms of the (i) distribution of the model's linear predictor and (ii) event and censoring distributions. Existing information can inform this; in particular, the linear predictor distribution can be approximated using the C-index or Royston's D statistic from the model development article, together with the overall event risk. We demonstrate how the approach can be used to calculate the sample size required to validate a prediction model for recurrent venous thromboembolism. Ideally the sample size should ensure precise calibration across the entire range of predicted risks, but must at least ensure adequate precision in regions important for clinical decision-making. Stata and R code are provided.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.