2018
DOI: 10.1109/tnnls.2018.2806006
|View full text |Cite
|
Sign up to set email alerts
|

An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem

Abstract: We investigate the adversarial multiarmed bandit problem and introduce an online algorithm that asymptotically achieves the performance of the best switching bandit arm selection strategy. Our algorithms are truly online such that we do not use the game length or the number of switches of the best arm selection strategy in their constructions. Our results are guaranteed to hold in an individual sequence manner, since we have no statistical assumptions on the bandit arm losses. Our regret bounds, i.e., our perf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1

Relationship

4
5

Authors

Journals

citations
Cited by 23 publications
(27 citation statements)
references
References 42 publications
0
27
0
Order By: Relevance
“…In RL, an agent discovers the best action (i.e., height) which yields the most reward (i.e., average cell throughput) through a process of trial and error. With the uniform user distribution and ring-based approximation elaborated in Figure 3, this scenario perfectly aligns with a markov decision process (MDP) with a single state (i.e., stationary environment) which can be optimally handled with RL-based multi-armed bandit (MAB) problem [41]. The aim of MAB is to develop a learning policy that achieves maximal cumulative reward.…”
Section: Abs Height Optimization Using Reinforcement Learningmentioning
confidence: 95%
“…In RL, an agent discovers the best action (i.e., height) which yields the most reward (i.e., average cell throughput) through a process of trial and error. With the uniform user distribution and ring-based approximation elaborated in Figure 3, this scenario perfectly aligns with a markov decision process (MDP) with a single state (i.e., stationary environment) which can be optimally handled with RL-based multi-armed bandit (MAB) problem [41]. The aim of MAB is to develop a learning policy that achieves maximal cumulative reward.…”
Section: Abs Height Optimization Using Reinforcement Learningmentioning
confidence: 95%
“…Over the past years, the global optimization problem has gathered significant attention with various algorithms being proposed in distinct fields of research. It has been studied especially in the fields of non-convex optimization [6]- [8], Bayesian optimization [9], convex optimization [10]- [12], bandit optimization [13], stochastic optimization [14], [15]; because of its practical applications in distribution estimation [16]- [19], multi-armed bandits [20]- [22], control theory [23], signal processing [24], game theory [25], prediction [26], [27], decision theory [28] and anomaly detection [29]- [31].…”
Section: A Motivationmentioning
confidence: 99%
“…In these types of applications, we encounter the fundamental dilemma of exploration-exploitation trade-off, which is most throughly studied in the multi-armed bandit problem [43]. To that end, study of the multi-armed bandit problem has received considerable attention over the years [32], [34], [35], [37], [39], [43]- [45], where the goal is to minimize or maximize some loss or reward, respectively, in a problem environment by sequentially selecting one of M given actions [46].…”
Section: A Preliminariesmentioning
confidence: 99%