Calibration is one of the main properties that must be accomplished by any predictive model. Overcoming the limitations of many approaches developed so far, a study has recently proposed the calibration belt as a graphical tool to identify ranges of probability where a model based on dichotomous outcomes miscalibrates. In this new approach, the relation between the logits of the probability predicted by a model and of the event rates observed in a sample is represented by a polynomial function, whose coefficients are fitted and its degree is fixed by a series of likelihood-ratio tests. We propose here a test associated with the calibration belt and show how the algorithm to select the polynomial degree affects the distribution of the test statistic. We calculate its exact distribution and confirm its validity via a numerical simulation. Starting from this distribution, we finally reappraise the procedure to construct the calibration belt and illustrate an application in the medical context.
Objectives: To evaluate the accuracy of the peroneal nerve test (PENT) in the diagnosis of critical illness polyneuropathy (CIP) and myopathy (CIM) in the intensive care unit (ICU). We hypothesised that abnormal reduction of peroneal compound muscle action potential (CMAP) amplitude predicts CIP/CIM diagnosed using a complete nerve conduction study and electromyography (NCS-EMG) as a reference diagnostic standard.
Design: prospective observational study.
Setting: Nine Italian ICUs.
Patients: One-hundred and twenty-one adult (≥18 years) neurologic (106) and non-neurologic (15) critically ill patients with an ICU stay of at least 3 days.
Interventions: None.
Measurements and main results: Patients underwent PENT and NCS-EMG testing on the same day conducted by two independent clinicians who were blind to the results of the other test. Cases were considered as true negative if both NCS-EMG and PENT measurements were normal. Cases were considered as true positive if the PENT result was abnormal and NCS-EMG showed symmetric abnormal findings, independently from the specific diagnosis by NCS-EMG (CIP, CIM, or combined CIP and CIM). All data were centrally reviewed and diagnoses were evaluated for consistency with predefined electrophysiological diagnostic criteria for CIP/CIM.During the study period, 342 patients were evaluated, 124 (36.3%) were enrolled and 121 individuals with no protocol violation were studied. Sensitivity and specificity of PENT were 100% (95% CI 96.1-100.0) and 85.2% (95% CI 66.3-95.8). Of 23 patients with normal results, all presented normal values on both tests with no false negative results. Of 97 patients with abnormal results, 93 had abnormal values on both tests (true positive), whereas four with abnormal findings with PENT had only single peroneal nerve neuropathy at complete NCS-EMG (false positive).
Conclusions: PENT has 100% sensitivity and high specificity, and can be used as a screening test to diagnose CIP/CIM in the ICU.
Evaluating the goodness of fit of logistic regression models is crucial to ensure the accuracy of the estimated probabilities. Unfortunately, such evaluation is problematic in large samples. Because the power of traditional goodness of fit tests increases with the sample size, practically irrelevant discrepancies between estimated and true probabilities are increasingly likely to cause the rejection of the hypothesis of perfect fit in larger and larger samples. This phenomenon has been widely documented for popular goodness of fit tests, such as the Hosmer‐Lemeshow test. To address this limitation, we propose a modification of the Hosmer‐Lemeshow approach. By standardizing the noncentrality parameter that characterizes the alternative distribution of the Hosmer‐Lemeshow statistic, we introduce a parameter that measures the goodness of fit of a model but does not depend on the sample size. We provide the methodology to estimate this parameter and construct confidence intervals for it. Finally, we propose a formal statistical test to rigorously assess whether the fit of a model, albeit not perfect, is acceptable for practical purposes. The proposed method is compared in a simulation study with a competing modification of the Hosmer‐Lemeshow test, based on repeated subsampling. We provide a step‐by‐step illustration of our method using a model for postneonatal mortality developed in a large cohort of more than 300 000 observations.
A prognostic model is well calibrated when it accurately predicts event rates. This is first determined by testing for goodness of fit with the development dataset. All existing tests and graphic tools designed for the purpose suffer several drawbacks, related mainly to the subgrouping of observations or to heavy dependence on arbitrary parameters. We propose a statistical test and a graphical method to assess the goodness of fit of logistic regression models, obtained through an extension of similar techniques developed for external validation. We analytically computed and numerically verified the distribution of the underlying statistic. Simulations on a set of realistic scenarios show that this test and the well-known Hosmer-Lemeshow approach have similar type I error rates. The main advantage of this new approach is that the relationship between model predictions and outcome rates across the range of probabilities can be represented in the calibration belt plot, together with its statistical confidence. By readily spotting any deviations from the perfect fit, this new graphical tool is designed to identify, during the process of model development, poorly modeled variables that call for further investigation. This is illustrated through an example based on real data.
The diffusion of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) worldwide prompted the World Health Organization to declare the status of pandemic. The molecular diagnosis of SARS-CoV-2 infection is based on the detection of viral RNA on different biological specimens. Unfortunately, the test may require several hours to be performed. In the present study, we evaluated the diagnostic accuracy of lung point-of-care ultrasound (POCUS) for SARS-CoV-2 pneumonia in a cohort of symptomatic patients admitted to one emergency department (ED) in a high-prevalence setting. This retrospective study enrolled all patients who visited one ED with suspected respiratory infection in March 2020. All the patients were tested (usually twice if the first was negative) for SARS-CoV-2 on ED admission. The reference standard was considered positive if at least one specimen was positive. If all the specimens tested negative, the reference was considered negative. Diagnostic accuracy was evaluated using sensitivity, specificity, and positive and negative predictive value. Of the 444 symptomatic patients who were admitted to the ED in the study period, the result of the lung POCUS test was available for 384 (86.5%). The sensitivity of the test was 92.0% (95% CI 88.2–94.9%), and the specificity was 64.9% (95% CI 54.6–74.4%). We observed a prevalence of SARS-CoV-2 infection of 74.7%. In this setting, the positive and negative predicted values were 88.6% (95% CI 84.4–92.0) and 73.3% (95% CI 62.6–82.2%), respectively. Lung POCUS is a sensitive first-line screening tool for ED patients presenting with symptoms suggestive of SARS-CoV-2 infection.
Electronic supplementary material
The online version of this article (10.1007/s11739-020-02524-8) contains supplementary material, which is available to authorized users.
The calibration belt is a graphical approach designed to evaluate the goodness of fit of binary outcome models such as logistic regression models. The calibration belt examines the relationship between estimated probabilities and observed outcome rates. Significant deviations from the perfect calibration can be spotted on the graph. The graphical approach is paired to a statistical test, synthesizing the calibration assessment in a standard hypothesis testing framework. In this article, we present the calibrationbelt command, which implements the calibration belt and its associated test in Stata.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.