SummaryA large number of severity of illness scoring systems have been developed and they are widely used in intensive care practice. However, they are complex systems with their basis in mathematics. To use such systems effectively, it is important to appreciate what factors influence their performance so that they can be compared fairly and used most appropriately. The purpose of this review is to describe the methods commonly used to assess the various facets of performance in severity of illness scoring systems. The performance of the most frequently used scoring systems in adult intensive care practice are presented. The shortfalls, misuse and strengths of scoring systems are also discussed.Keywords Intensive care; severity of illness scoring systems. Severity of illness scores stratify critically ill patients, provide meaningful information in many clinical contexts and collate clinical practice. Generally, severity of illness scores measure the degree of illness and reflect the complexity of the disease process. However, such systems have had their use extended so that they may be used to predict and compare outcomes, allocate resources and examine the process of care. There is little doubt that severity scoring systems have revolutionised intensive care. However, their limitations include a failure to predict functional status or quality of life after critical illness.As with any tool or model, it is important that the correct severity scoring system is selected and then applied in the way its developers intended. Therefore, the purpose of this review is to analyse critically the development and performance of commonly employed intensive care severity of illness scores. The commonly used general adult severity of illness scores, measuring severity of illness at a fixed point and over time, will be described. Finally, their limitations and misuse will be briefly presented.
AppraisalThe critical appraisal of the development of a severity of illness score involves the measurement of accuracy (calibration and discrimination), reliability, content validity and methodological rigour.
Accuracy -calibrationCalibration refers to how closely the estimated probabilities of mortality generated by the severity scoring system correlate with actual mortality over the entire range of probabilities. In other words, this is the accuracy of measurement for every interval of measurement. Calibration is usually tested with a 'goodness of fit' test, where a large 'p' value is sought, suggesting that patients predicted to die and those who actually die come from the
1185ᮊ 1998 Blackwell Science Ltd same population. One such goodness of fit test is the Hosmer-Lemeshow C statistic [6] and an example of the results is shown in Table 1.The Hosmer-Lemeshow goodness of fit C statistic compares the observed and expected frequencies over the entire range of deciles of risk from low to high and expresses the likelihood of the distributions being different using the Chi-squared statistic. In the example given, the p value is 0.591, sug...