Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2017
DOI: 10.1109/tsp.2017.2706192
|View full text |Cite
|
Sign up to set email alerts
|

On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits

Abstract: We consider the best-arm identification problem in multi-armed bandits, which focuses purely on exploration. A player is given a fixed budget to explore a finite set of arms, and the rewards of each arm are drawn independently from a fixed, unknown distribution. The player aims to identify the arm with the largest expected reward. We propose a general framework to unify sequential elimination algorithms, where the arms are dismissed iteratively until a unique arm is left. Our analysis reveals a novel performan… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
14
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 24 publications
(14 citation statements)
references
References 21 publications
0
14
0
Order By: Relevance
“…The algorithm is summarized in Algorithm 2. Compared with the original UCB algorithm in (3), the main difference is the additional term γ(μ(t − 1), N(t − 1)) in (15). We now highlight the main idea why our bandit algorithm works and the role of this additional term.…”
Section: A Robust Bandit Algorithmmentioning
confidence: 97%
See 1 more Smart Citation
“…The algorithm is summarized in Algorithm 2. Compared with the original UCB algorithm in (3), the main difference is the additional term γ(μ(t − 1), N(t − 1)) in (15). We now highlight the main idea why our bandit algorithm works and the role of this additional term.…”
Section: A Robust Bandit Algorithmmentioning
confidence: 97%
“…else 5: The user chooses arm I t to pull according (15). 6: end if 7: if The attacker decides to attack then 8: The attacker attacks and changes I t to I 0 t .…”
mentioning
confidence: 99%
“…Xu [9] used MAB models to balance exploiting user data and protecting user privacy in dynamic pricing. Shahrampour [10] proposed a new algorithm for choosing the best arm of MAB, which outperforms the state-of-art. Lacerda [11] proposed an algorithm named as Multi-Objective Ranked Bandits for recommender systems.…”
Section: Related Workmentioning
confidence: 99%
“…These studies seldom consider the uncertainty of users' behaviours, so this paper introduces an online learning method called multi-armed bandits (MAB) to solve the problem. MAB has shown effectiveness and merit in air conditioning demand aggregation [16] and many other sequential decisionmaking problems containing uncertain/unknown behavioural factors [17][18][19][20][21][22][23][24][25][26][27]. In reference [28], an adversarial MAB framework is applied to learn the signal response of thermal control loads for demand response in real-time.…”
Section: Introductionmentioning
confidence: 99%