Inference after two-stage single-arm designs with binary endpoint is challenging due to the nonunique ordering of the sampling space in multistage designs. We illustrate the problem of specifying test-compatible confidence intervals for designs with nonconstant second-stage sample size and present two approaches that guarantee confidence intervals consistent with the test decision. Firstly, we extend the well-knownClopper-Pearson approach of inverting a family of two-sided hypothesis tests from the group-sequential case to designs with fully adaptive sample size. Test compatibility is achieved by using a sample space ordering that is derived from a test-compatible estimator. The resulting confidence intervals tend to be conservative but assure the nominal coverage probability. In order to assess the possibility of further improving these confidence intervals, we pursue a direct optimization approach minimizing the mean width of the confidence intervals. While the latter approach produces more stable coverage probabilities, it is also slightly anti-conservative and yields only negligible improvements in mean width. We conclude that the Clopper-Pearson-type confidence intervals based on a test-compatible estimator are the best choice if the nominal coverage probability is not to be undershot and compatibility of test decision and confidence interval is to be preserved.
K E Y W O R D Sadaptive designs, binary endpoint, confidence interval, two-stage designs
BACKGROUNDSingle-arm two-stage designs are frequently applied in phase II of oncology research. Such designs allow testing the one-sided null hypothesis 0 ∶ ≤ 0 that the response probability of a new treatment is smaller than or equal to 0 at level and with power 1 − at some point alternative 1 > 0 . Simon's optimal designs (Simon, 1989) are the most popular two-stage designs for phase II oncology trials (Ivanova et al., 2016). These group-sequential designs with prefixed stage-wise sample size minimize the expected sample size on the boundary of the null hypothesis 0 under the given restrictions on type I and type II error rates. Extensions of Simon's group-sequential designs that allow the stage-two sample size to vary with the number of observed interim responses have been proposed by several authors (Banerjee and Tsiatis, 2006;Mander and Thompson, 2010;Jin et al., 2012;Englert and Kieser, 2013;Shan, Wilding, Hutson and Gerstenberger, 2016;Kunzmann and Kieser, 2016). Interval estimation after two-stage designs is complicated by the bias due to the response-adaptive sampling scheme. The group-sequential situation has been studied extensively (Jennison and Turnbull, 2000) and was applied to Simon's designs (Koyama and Chen, 2008). The principal idea is to resolve the inherent arbitrariness of p values in multistage designs by resorting to the stage-wise ordering that is induced by the uniformly minimum variance unbiased estimator (UMVUE) for the response probability . The obtained