PurposeThe reverse fragility index (RFI) is a novel metric to appraise the results of studies reporting statistically non‐significant results. The purpose of this study was to determine the statistical robustness of randomized controlled trials (RCTs) reporting non‐significant differences in anterior cruciate ligament reconstruction (ACLR) graft failure rates, defined as re‐rupture/revision ACLR rate, between hamstring tendon (HT) and bone–patellar tendon–bone (BTB) autografts by calculating RFIs.
MethodsA systematic review was performed to identify RCTs that compared HT to BTB grafts for ACLR through January 2022. Studies reporting non‐significant differences in graft re‐rupture and revision ACLR rate (n.s.) were included. The RFI, defined as the fewest number of event reversals needed to change the non‐significant graft re‐rupture/revision outcome to statistically significant (P < 0.05), was recorded for each study. In addition, the number of studies in which the loss to follow‐up exceeded the RFI was recorded.
ResultsAmong the 16 included RCTs, the median (interquartile range [IQR]) sample size was 71 (64–114), and the median (IQR) total number of graft re‐rupture/revision ACLR events was 4 (4–6). The median (IQR) graft re‐rupture/revision ACLR rate was 4.3% (3.0–6.4) overall, 4.1% (2.6–6.7) in the BTB group, and 5.4% (3.0–6.3) in the HT group. The median (IQR) RFI was 3 (3–4), signifying that a reversal of the outcome in 3 patients in one arm was needed to flip the studies’ result from non‐significant to statistically significant (P < 0.05). The median (IQR) number of participants lost to follow‐up was 11 (3–13), and 13 (81.3%) of the included studies had a loss to follow‐up greater than the studies’ RFI.
ConclusionThe results of RCTs reporting statistically non‐significant re‐rupture/revision ACLR rates between HT and BTB autografts would become significant if the outcome were reversed in a small number of patients—a number that was less than the loss to follow‐up in the majority of studies. Thus, the neutrality of these studies is fragile, and a true statistically significant difference in re‐rupture/revision rates may have been undetected.
Level of evidenceLevel I.