Thresholding Bandits with Augmented UCB

Mukherjee, Subhojyoti; Naveen, K. P.; Sudarsanam, Nandan; Ravindran, Balaraman

doi:10.24963/ijcai.2017/350

Cited by 12 publications

(10 citation statements)

References 13 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(Chen et al, 2014) also develops the CSAR algorithm for the fixed-budget setting which can also be used for TBP. The result was improved by recent followup work (Locatelli et al, 2016;Mukherjee et al, 2017) under the fixed budget setting. Chen et al (2015) considered TBP in the context of budget allocation for crowdsourced classification in the Bayesian framework.…”

Section: Introductionmentioning

confidence: 87%

See 1 more Smart Citation

Thresholding Bandit Problem with Both Duels and Pulls

Xu,

Chen,

Singh

et al. 2019

Preprint

View full text Add to dashboard Cite

The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also duel two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost and time efficient than direct pulls. We refer to this problem as TBP with Dueling Choices (TBP-DC). This paper provides an algorithm called Rank-Search (RS) for solving TBP-DC by alternating between ranking and binary search. We prove theoretical guarantees for RS, and also give lower bounds to show the optimality of it. Experiments show that RS outperforms previous baseline algorithms that only use pulls or duels.

show abstract

Section: Introductionmentioning

confidence: 87%

“…We note that previous works on TBP in the fixed budget setting (Locatelli et al, 2016;Mukherjee et al, 2017) cannot be implemented in our fixed-confidence setting.…”

Section: Baselines and Implementation Detailsmentioning

confidence: 96%

Thresholding Bandit Problem with Both Duels and Pulls

Xu,

Chen,

Singh

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…We consider three settings named Threshold 1-3, which are based on Experiment 1-2 in Locatelli et al (2016) and Experiment 4 in Mukherjee et al (2017).…”

Section: Threshold Settingsmentioning

confidence: 99%

Good arm identification via bandit feedback

et al. 2019

View full text Add to dashboard Cite

We consider a novel stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem that a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the exploration-exploitation dilemma of confidence, which is different difficulty from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor O(log 1 δ ) for acceptance error rate δ. We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis.

show abstract

“…This variant of the multi-armed bandit problem was introduced by Locatelli et al (2016), who provided an algorithm for solving the problem with matching upper and lower bounds. Mukherjee et al (2017) and Zhong et al (2017) have since provided algorithmic extensions that incorporate variance estimates and provide guarantees in asynchronous settings.…”

Section: Thresholding Banditsmentioning

confidence: 99%

Thresholding Graph Bandits with GrAPL

LeJeune¹,

Dasarathy²,

Baraniuk³

2019

Preprint

View full text Add to dashboard Cite

In this paper, we introduce a new online decision making paradigm that we call Thresholding Graph Bandits. The main goal is to efficiently identify a subset of arms in a multi-armed bandit problem whose means are above a specified threshold. While traditionally in such problems, the arms are assumed to be independent, in our paradigm we further suppose that we have access to the similarity between the arms in the form of a graph, allowing us gain information about the arm means in fewer samples. Such settings play a key role in a wide range of modern decision making problems where rapid decisions need to be made in spite of the large number of options available at each time. We present GrAPL, a novel algorithm for the thresholding graph bandit problem. We demonstrate theoretically that this algorithm is effective in taking advantage of the graph structure when available and the reward function homophily (that strongly connected arms have similar rewards) when favorable. We confirm these theoretical findings via experiments on both synthetic and real data.

show abstract

Thresholding Bandits with Augmented UCB

Cited by 12 publications

References 13 publications

Thresholding Bandit Problem with Both Duels and Pulls

Thresholding Bandit Problem with Both Duels and Pulls

Good arm identification via bandit feedback

Thresholding Graph Bandits with GrAPL

Contact Info

Product

Resources

About