A discussion of calibration techniques for evaluating binary and categorical predictive models

Fenlon, Caroline; O’Grady, Luke; Doherty, Michael L.; Dunnion, John

doi:10.1016/j.prevetmed.2017.11.018

Cited by 76 publications

(51 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To characterize model calibration and fit, we also calculated the calibration slope and intercept. Calibration slopes differing from one suggest model overfitting to the training data set, while calibration intercepts differing from 0 suggest systematic bias toward under‐ or overpredicting the risk of mortality . Analysis was performed using Stata/IC 14.2 (College Station, TX: StataCorp, LP) and R version 3.4.3 (R Foundation for Statistical Computing, Vienna, Austria) with packages pROC, caret, cvAUC, DMwR, MICE, and rms …”

Section: Methodsmentioning

confidence: 99%

Prediction of mortality following pediatric heart transplant using machine learning algorithms

Miller

Tumin

Cooper

et al. 2019

Pediatric Transplantation

View full text Add to dashboard Cite

Background Optimizing transplant candidates’ priority for donor organs depends on the accurate assessment of post‐transplant outcomes. Due to the complexity of transplantation and the wide range of possible serious complications, recipient outcomes are difficult to predict accurately using conventional multivariable regression. Therefore, we evaluated the utility of 3 ML algorithms for predicting mortality after pediatric HTx. Methods We identified patients <18 years of age receiving HTx in 2006‐2015 in the UNOS Registry database. Mortality within 1, 3, or 5 years was predicted using classification and regression trees, RFs, and ANN. Each model was trained using cross‐validation, then validated in a separate testing set. Model performance was primarily evaluated by the area under the receiver operating characteristic (AUC) curve. Results The training set included 2802 patients, whereas 700 were included in the testing set. RF achieved the best fit to the training data with AUCs of 0.74, 0.68, and 0.64 for 1‐, 3‐, and 5‐year mortality, respectively, and performed best in the testing data, with AUCs of 0.72, 0.61, and 0.60, respectively. Nevertheless, sensitivity was poor across models (training: 0.22‐0.58; testing: 0.07‐0.49). Discussion ML algorithms demonstrated fair predictive utility in both training and testing data, but the sensitivity of these algorithms was generally poor. With the registry missing data on many determinants of long‐term survival, the ability of ML methods to predict mortality after pediatric HTx may be fundamentally limited.

show abstract

Section: Methodsmentioning

confidence: 99%

Prediction of mortality following pediatric heart transplant using machine learning algorithms

Miller

Tumin

Cooper

et al. 2019

Pediatric Transplantation

View full text Add to dashboard Cite

show abstract

“…For model discrimination and calibration, the C-statistic was calculated from the receiver-operator characteristic curve to assess model discrimination, where a C-statistic of 0.50 or more indicates acceptable predictive power [37]. Hosmer-Lemeshow goodness-of-fit tests were applied and the calibration plots were generated for each model to assess model calibration [38].…”

Section: Statistical Analysesmentioning

confidence: 99%

Factors associated with 30-day and 1-year readmission among psychiatric inpatients in Beijing China: a retrospective, medical record-based analysis

et al. 2020

View full text Add to dashboard Cite

Background: Psychiatric readmissions negatively impact patients and their families while increasing healthcare costs. This study aimed at investigating factors associated with psychiatric readmissions within 30 days and 1 year of the index admissions and exploring the possibilities of monitoring and improving psychiatric care quality in China. Methods: Data on index admission, subsequent admission(s), clinical and hospital-related factors were extracted in the inpatient medical record database covering 10 secondary and tertiary psychiatric hospitals in Beijing, China. Logistic regressions were used to examine the associations between 30-day and 1-year readmissions plus frequent readmissions (≥3 times/year), and clinical variables as well as hospital characteristics. Results: The 30-day and 1-year psychiatric readmission rates were 16.69% (1289/7724) and 33.79% (2492/7374) respectively. 746/2492 patients (29.34%) were readmitted 3 times or more within a year (frequent readmissions). Factors significantly associated with the risk of both 30-day and 1-year readmission were residing in an urban area, having medical comorbidities, previous psychiatric admission(s), length of stay > 60 days in the index admission and being treated in tertiary hospitals (p < 0.001). Male patients were more likely to have frequent readmissions (OR 1.30, 95%CI 1.04-1.64). Receiving electroconvulsive therapy (ECT) was significantly associated with a lower risk of 30-day readmission (OR 0.72, 95%CI 0.56-0.91) and frequent readmissions (OR 0.60, 95%CI 0.40-0.91). Conclusion: More than 30% of the psychiatric inpatients were readmitted within 1 year. Urban residents, those with medical comorbidities and previous psychiatric admission(s) or a longer length of stay were more likely to be readmitted, and men are more likely to be frequently readmitted. ECT treatment may reduce the likelihood of 30day readmission and frequent admissions. Targeted interventions should be designed and piloted to effectively monitor and reduce psychiatric readmissions.

show abstract

“…We referred to the values of (1) calibration-in-the-large-comparison of the average of all predicted probabilities with the average observed depression cases in the Nepali dataset, with values closer to zero indicating better model performance; and (2) calibration slope-measure of agreement between observed depression and predicted risk of depression for all predictors in the Nepali dataset (a perfect model has a calibration slope of 1; [53]). A Chi-square test to measure unreliability of the calibration accuracy was performed to assess whether there was a statistically significant difference between the model predictions and the 45° line [53]. We assessed discrimination using the receiver operator characteristic (ROC) curve.…”

Section: Evaluation Of Model Performancementioning

confidence: 99%

Predicting the risk of depression among adolescents in Nepal using a model developed in Brazil: the IDEA Project

Brathwaite

Rocha

Kieling

et al. 2020

Eur Child Adolesc Psychiatry

View full text Add to dashboard Cite

The burden of adolescent depression is high in low-and middle-income countries (LMICs), yet research into prevention is lacking. Development and validation of models to predict individualized risk of depression among adolescents in LMICs is rare but crucial to ensure appropriate targeting of preventive interventions. We assessed the ability of a model developed in Brazil, a middle-income country, to predict depression in an existing culturally different adolescent cohort from Nepal, a lowincome country with a large youth population with high rates of depression. Data were utilized from the longitudinal study of 258 former child soldiers matched with 258 war-affected civilian adolescents in Nepal. Prediction modelling techniques were employed to predict individualized risk of depression at age 18 or older in the Nepali cohort using a penalized logistic regression model. Following a priori exclusions for prior depression and age, 55 child soldiers and 71 war-affected civilians were included in the final analysis. The model was well calibrated, had good overall performance, and achieved good discrimination between depressed and non-depressed individuals with an area under the curve (AUC) of 0.73 (bootstrap-corrected 95% confidence interval 0.62-0.83). The Brazilian model comprising seven matching sociodemographic predictors, was able to stratify individualized risk of depression in a Nepali adolescent cohort. Further testing of the model's performance in larger socio-culturally diverse samples in other geographical regions should be attempted to test the model's wider generalizability.

show abstract

A discussion of calibration techniques for evaluating binary and categorical predictive models

Cited by 76 publications

References 21 publications

Prediction of mortality following pediatric heart transplant using machine learning algorithms

Prediction of mortality following pediatric heart transplant using machine learning algorithms

Factors associated with 30-day and 1-year readmission among psychiatric inpatients in Beijing China: a retrospective, medical record-based analysis

Predicting the risk of depression among adolescents in Nepal using a model developed in Brazil: the IDEA Project

Contact Info

Product

Resources

About