Irreversible Adaptive Allocation Rules

Hu, Inchi; Wei, Chen-Yu

doi:10.1214/aos/1176347144

Cited by 11 publications

(10 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, Lai [17] and Chang and Lai [6] have proposed simple index-type adaptive allocation rules that are asymptotically optimal in both the Bayes and frequentist senses either as N → ∞ (under uniform discounting) or as β → 1 (under geometric discounting). Brezzi and Lai [5] have recently refined and modified these adaptive allocation rules in the presence of switching costs, while Hu and Wei [15] have constructed asymptotically optimal adaptive allocation rules subject to the irreversibility constraint. Various applications of the theory of multi-armed bandits can be found in sequential clinical trials, market pricing, labor markets and search problems; see e.g.…”

Section: Introductionmentioning

confidence: 99%

Some results on the Gittins index for a normal reward process

Yao¹

2006

Institute of Mathematical Statistics Lecture Notes - Monograph Series

View full text Add to dashboard Cite

We consider the Gittins index for a normal distribution with unknown mean θ and known variance where θ has a normal prior. In addition to presenting some monotonicity properties of the Gittins index, we derive an approximation to the Gittins index by embedding the (discrete-time) normal setting into the continuous-time Wiener process setting in which the Gittins index is determined by the stopping boundary for an optimal stopping problem. By an application of Chernoff's continuity correction in optimal stopping, the approximation includes a correction term which accounts for the difference between the discrete and continuous-time stopping boundaries. Numerical results are also given to assess the performance of this simple approximation.

show abstract

Section: Introductionmentioning

confidence: 99%

Some results on the Gittins index for a normal reward process

Yao¹

2006

Institute of Mathematical Statistics Lecture Notes - Monograph Series

View full text Add to dashboard Cite

show abstract

“…The proof can be found in [1]. We will discuss the relation of the lower bound with those in [6,7] and [3]. Theorem 1.…”

Section: The Regret Lower Boundmentioning

confidence: 99%

“…Here we further explore optimality properties of the proposed strategies. First, we show that the efficiency benchmark, which is given by the regret lower bound, reduces to those in Lai and Robbins (1985), Wei (1989), andHu (2000). This implies that the proposed strategy is also optimal under the settings of aforementioned papers.…”

mentioning

confidence: 96%

Multi-armed bandit problem with precedence relations

Chan¹,

Fuh²,

Hu³

2006

Institute of Mathematical Statistics Lecture Notes - Monograph Series

Self Cite

View full text Add to dashboard Cite

Consider a multi-phase project management problem where the decision maker needs to deal with two issues: (a) how to allocate resources to projects within each phase, and (b) when to enter the next phase, so that the total expected reward is as large as possible. We formulate the problem as a multi-armed bandit problem with precedence relations. In Chan, Fuh and Hu (2005), a class of asymptotically optimal arm-pulling strategies is constructed to minimize the shortfall from perfect information payoff. Here we further explore optimality properties of the proposed strategies. First, we show that the efficiency benchmark, which is given by the regret lower bound, reduces to those in Lai and Robbins (1985), Wei (1989), andHu (2000). This implies that the proposed strategy is also optimal under the settings of aforementioned papers. Secondly, we establish the super-efficiency of proposed strategies when the bad set is empty. Thirdly, we show that they are still optimal with constant switching cost between arms. In addition, we prove that the Wald's equation holds for Markov chains under Harris recurrent condition, which is an important tool in studying the efficiency of the proposed strategies.

show abstract

“…The improvements are due to adaptations of UCB to take into account the unavoidable experimentation costs unique to a particular problem, in this case the higher costs when the number of arms is large. The construction of optimal bandit algorithms for irreversible rules in Hu and Wei (1989) is also based on this principle.…”

Section: Introductionmentioning

confidence: 99%

Optimal UCB Adjustments for Large Arm Sizes

Chan¹,

Hu²

2019

Preprint

View full text Add to dashboard Cite

The regret lower bound of Lai and Robbins (1985), the gold standard for checking optimality of bandit algorithms, considers arm size fixed as sample size goes to infinity. We show that when arm size increases polynomially with sample size, a surprisingly smaller lower bound is achievable. This is because the larger experimentation costs when there are more arms permit regret savings by exploiting the best performer more often. In particular we are able to construct a UCB-Large algorithm that adaptively exploits more when there are more arms. It achieves the smaller lower bound and is thus optimal. Numerical experiments show that UCB-Large performs better than classical UCB that does not correct for arm size, and better than Thompson sampling.

show abstract

Irreversible Adaptive Allocation Rules

Cited by 11 publications

References 9 publications

Some results on the Gittins index for a normal reward process

Some results on the Gittins index for a normal reward process

Multi-armed bandit problem with precedence relations

Optimal UCB Adjustments for Large Arm Sizes

Contact Info

Product

Resources

About