Using Critical Success Index or Gilbert Skill Score as composite measures of positive predictive value and sensitivity in diagnostic accuracy studies: Weather forecasting informing epilepsy research

Mbizvo, Gashirai K; Bennett, Kyle; Simpson, Colin R; Duncan, Susan E.; Chin, Richard; Larner, A. J.

doi:10.1111/epi.17537

Cited by 9 publications

(7 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, high EI values could result from very high numbers of TN alone even if numbers of TP were modest as long as numbers of FP and FN were few, a situation which may be encountered for example when handling administrative health datasets [22] and polygenic hazard scores [23]. Addressing the class imbalance problem using methods which oversample the minority class (in the current example TP cases), such as variants of the synthetic majority oversampling technique, SMOTE [21], or which undersample the majority class (in the current example TN) might be applicable.…”

Section: Limitationsmentioning

confidence: 99%

Efficiency Index for Binary Classifiers: Concept, Extension, and Application

Larner¹

2023

Preprint

View full text Add to dashboard Cite

Many metrics exist for the evaluation of binary classifiers, all with their particular advantages and shortcomings. Recently an “Efficiency Index” for the evaluation of classifiers has been proposed, based on the consistency, or matching, and contradiction, or mismatching, of outcomes. This metric and its confidence intervals are easy to calculate from base data in a 2x2 contingency table, and values can be qualitatively and semi-quantitatively categorised. For medical tests, in which context the Efficiency Index was originally proposed, it facilitates communication of risk (of correct diagnosis versus misdiagnosis) to both clinicians and patients. Variants of the efficiency index (balanced, unbiased) which take into account disease prevalence and test cut-offs have also been described. The objectives of the current paper were firstly to extend the EI construct to other formulations (balanced level, quality) and secondly to explore the utility of EI and all four of its variants when applied to the dataset of a large prospective test accuracy study of a cognitive screening instrument. This showed that the balanced level, quality, and unbiased formulations of EI are more stringent measures.

show abstract

Section: Limitationsmentioning

confidence: 99%

Efficiency Index for Binary Classifiers: Concept, Extension, and Application

Larner¹

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…This circumstance makes it difficult to rank the diagnostic accuracy of the corresponding case-ascertainment algorithms based on Spec, NPV, or Acc, as the figures are all similarly high [ 32 ]. In conditions such as dementia [ 33 ], motor neuron disease [ 34 ], and epilepsy [ 35 , 36 ], systematic reviews of the diagnostic accuracy of routine data indicate that the original studies published have largely measured the positive predictive value (PPV) and sensitivity (Sens) without measuring Spec or NPV.…”

Section: Introductionmentioning

confidence: 99%

“…We have demonstrated the advantages of using CSI to complement traditional diagnostic accuracy measures using real-word data in several conditions [ 32 , 40 , 57 ].…”

Section: Introductionmentioning

confidence: 99%

On the Dependence of the Critical Success Index (CSI) on Prevalence

Mbizvo,

Larner

2024

Diagnostics

Self Cite

View full text Add to dashboard Cite

The critical success index (CSI) is an established metric used in meteorology to verify the accuracy of weather forecasts. It is defined as the ratio of hits to the sum of hits, false alarms, and misses. Translationally, CSI has gained popularity as a unitary outcome measure in various clinical situations where large numbers of true negatives may influence the interpretation of other, more traditional, outcome measures, such as specificity (Spec) and negative predictive value (NPV), or when unified interpretation of positive predictive value (PPV) and sensitivity (Sens) is needed. The derivation of CSI from measures including PPV has prompted questions as to whether and how CSI values may vary with disease prevalence (P), just as PPV estimates are dependent on P, and hence whether CSI values are generalizable between studies with differing prevalences. As no detailed study of the relation of CSI to prevalence has been undertaken hitherto, the dataset of a previously published test accuracy study of a cognitive screening instrument was interrogated to address this question. Three different methods were used to examine the change in CSI across a range of prevalences, using both the Bayes formula and equations directly relating CSI to Sens, PPV, P, and the test threshold (Q). These approaches showed that, as expected, CSI does vary with prevalence, but the dependence differs according to the method of calculation that is adopted. Bayesian rescaling of both Sens and PPV generates a concave curve, suggesting that CSI will be maximal at a particular prevalence, which may vary according to the particular dataset.

show abstract

“…In terms of the base data of the 2x2 contingency table: CSI may also be expressed in terms of PPV and Sens: We have demonstrated the advantages of using CSI to complement traditional diagnostic accuracy measures using real-word data in several conditions. 3,9,17 A question often raised about CSI concerns how its values relate to prevalence, P, the probability of a positive diagnosis. It is well-known that values of PPV vary with P, hence are sensitive to class imbalance and may therefore not be generalizable between studies.…”

Section: Introductionmentioning

confidence: 99%

“…This circumstance makes it difficult to rank the diagnostic accuracy of the corresponding case-ascertainment algorithms based on Spec, NPV, or Acc, as the figures are all similarly high. 3 In conditions such as dementia, 4 motor neurone disease, 5 and epilepsy, 2 systematic reviews of the diagnostic accuracy of routine data indicate that the original studies published have largely measured positive predictive value (PPV) and Sens without measuring Spec or NPV. This is because finding true negative cases in the community to verify an absent diagnostic code in a routine dataset is a challenge for researchers, who often only have permission to study populations that have been positively coded with the disease in question.…”

Section: Introductionmentioning

confidence: 99%

On the dependence of the critical success index (CSI) on prevalence

Mbizvo,

Larner

2023

Preprint

Self Cite

View full text Add to dashboard Cite

Recently the critical success index (CSI) has been increasingly discussed and advocated as a unitary outcome measure in various clinical situations where large numbers of true negatives may influence the interpretation of other more traditional outcome measures such as sensitivity and specificity, or when unified interpretation of positive predictive value (PPV) and sensitivity (Sens) is needed. The derivation of CSI from measures including PPV has prompted questions as to whether and how CSI values may vary with disease prevalence (P), just as PPV estimates are dependent on P, and hence whether CSI values are generalizable between studies with differing prevalences. As no detailed study of the relation of CSI to prevalence has been undertaken hitherto, the dataset of a previously published test accuracy study of a cognitive screening instrument was reinterrogated to address this question. Three different methods were used to examine the change in CSI across a range of prevalences, using both Bayes formula and equations directly relating CSI to Sens, PPV, P, and to test threshold (Q). These approaches showed that, as expected, CSI does vary with prevalence, but the dependence differs according to the method of calculation adopted. Bayesian rescaling both Sens and PPV generates a concave curve, suggesting that CSI will be maximal at a particular prevalence which may vary according to the particular dataset.

show abstract

Using Critical Success Index or Gilbert Skill Score as composite measures of positive predictive value and sensitivity in diagnostic accuracy studies: Weather forecasting informing epilepsy research

Cited by 9 publications

References 10 publications

Efficiency Index for Binary Classifiers: Concept, Extension, and Application

Efficiency Index for Binary Classifiers: Concept, Extension, and Application

On the Dependence of the Critical Success Index (CSI) on Prevalence

On the dependence of the critical success index (CSI) on prevalence

Contact Info

Product

Resources

About