The Budgeted Multi-armed Bandit Problem

Madani, Omid; Lizotte, Daniel J.; Greiner, Russell

doi:10.1007/978-3-540-27819-1_46

Cited by 31 publications

(28 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Figure 3 supports the predictiveness of (12) with respect to relative empirical costs. Note that, if one assumes the same target confidence τ for all bandits b i ∈ B, then accounting for τ inĉ i would not affect the ordering of bandits according tô c i , as an "easier" bandit according to (12) would also have a smaller expected completion cost for any value of τ . By using (12), we also ignore the effort previously expended on a given bandit.…”

Section: Bayesian Greedy Confidence Pursuitmentioning

confidence: 99%

“…While considering the number of trials already spent on a bandit could improve on the performance of (12), it would require steps to avoid the "sunk-cost" fallacy of economics, as manifested by premature commitment to bandits wrongly identified as "easy". (12). From darkest to lightest the points represent selecting the top 1, 3, and 5 arms of a 10-armed bandit.…”

Section: Bayesian Greedy Confidence Pursuitmentioning

confidence: 99%

“…(3). Note that (12) captures dependence on the subset size m through its use of Γ i and thatĉ i becomes stochastic when sampled with respect to the perbandit posteriors over returns and gaps. A theoretical analysis of our allocation process is beyond the scope of this paper, but the properties of posterior sampling described in Section 3 suggest it will efficiently direct trials towards the bandits with minimalĉ i .…”

Section: Bayesian Greedy Confidence Pursuitmentioning

confidence: 99%

“…We begin our empirical examination of greedy confidence pursuit with tests supporting (12) as an approximate completion cost. These tests were based on selecting the top 1, 3, and 5 arms of 10-armed bandits, with returns distributed as in Section 4.…”

Section: Testing Greedy Confidence Pursuitmentioning

confidence: 99%

“…The machine learning community has developed ways to formulate and address variations on this problem. For example, budgeted learning [12] and subsequent work considers the problem of active learning when a fixed budget is given for probing which model among a collection of models is best for a given task.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Greedy Confidence Pursuit: A Pragmatic Approach to Multi-bandit Optimization

Bachman

Precup

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

Abstract. We address the practical problem of maximizing the number of high-confidence results produced among multiple experiments sharing an exhaustible pool of resources. We formalize this problem in the framework of bandit optimization as follows: given a set of multiple multi-armed bandits and a budget on the total number of trials allocated among them, select the top-m arms (with high confidence) for as many of the bandits as possible. To solve this problem, which we call greedy confidence pursuit, we develop a method based on posterior sampling. We show empirically that our method outperforms existing methods for top-m selection in single bandits, which has been studied previously, and improves on baseline methods for the full greedy confidence pursuit problem, which has not been studied previously.

show abstract

Section: Bayesian Greedy Confidence Pursuitmentioning

confidence: 99%

Section: Bayesian Greedy Confidence Pursuitmentioning

confidence: 99%

Section: Bayesian Greedy Confidence Pursuitmentioning

confidence: 99%

Section: Testing Greedy Confidence Pursuitmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Greedy Confidence Pursuit: A Pragmatic Approach to Multi-bandit Optimization

Bachman

Precup

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

show abstract

Nearly Optimal Exploration-Exploitation Decision Thresholds

Dimitrakakis

2006

Artificial Neural Networks – ICANN 2006

View full text Add to dashboard Cite

While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. In this paper, we first derive upper bounds to for the utility of selecting different actions in the multi-armed bandit setting. Unlike the common statistical upper confidence bounds, these explicitly link the planning horizon, uncertainty and the need for exploration explicit. The resulting algorithm can be seen as a generalisation of the classical Thompson sampling algorithm. We experimentally test these algorithms, as well as ǫ-greedy and the value of perfect information heuristics. Finally, we also introduce the idea of bagging for reinforcement learning. By employing a version of online bootstrapping, we can efficiently sample from an approximate posterior distribution. ⋆ Thanks to M. Keller and R. Chavarriaga, for comments and interesting discussions.This work has received financial support from the Swiss NSF under the MULTI project (2000-068231.021/1) and from IDIAP. This is updated version better discusses earlier work and places this paper in a proper context.

show abstract

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

Burnetas

Kanavetas

2012

Applications of Mathematics and Informatics in Military Science

View full text Add to dashboard Cite

We consider the problem of sequential sampling from a finite number of independent statistical populations to maximize the expected infinite horizon average outcome per period, under a constraint that the expected average sampling cost does not exceed an upper bound. The outcome distributions are not known. We construct a class of consistent adaptive policies, under which the average outcome converges with probability 1 to the true value under complete information for all distributions with finite means. We also compare the rate of convergence for various policies in this class using simulation.

show abstract

The Budgeted Multi-armed Bandit Problem

Cited by 31 publications

References 1 publication

Greedy Confidence Pursuit: A Pragmatic Approach to Multi-bandit Optimization

Greedy Confidence Pursuit: A Pragmatic Approach to Multi-bandit Optimization

Nearly Optimal Exploration-Exploitation Decision Thresholds

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

Contact Info

Product

Resources

About