“…RAR designs are commonly compared using inferential and estimation metrics (e.g., type I error, power, and bias) rather than measures of patient benefit, which remain underreported and have received little attention in the RAR literature (Robertson et al., 2020). This is in part because existing patient benefit metrics, including the expected number of trial failures, the proportion of patients assigned to the inferior arm, and the probability of a treatment imbalance in the wrong direction, are often limited by failures to hold type I and II error rates constant or to account for the different sample size requirements of the designs under consideration (Karrison et al., 2003; Morgan and Coad, 2007; Zhu and Hu, 2010; Robertson et al., 2020). One approach to correct for the latter issue is to compare designs with respect to the expected number of failures within a finite patient horizon (Villar et al., 2015, a) (Villar et al., 2015, b).…”