2019
DOI: 10.1109/tsp.2019.2928952
|View full text |Cite
|
Sign up to set email alerts
|

Minimax Optimal Algorithms for Adversarial Bandit Problem With Multiple Plays

Abstract: We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching m-arm strategy with minimax optimal regret bounds. To construct our algorithm, we introduce a new expert advice algorithm for the multiple-play setting. By using our expert advice algorithm, we additionally improve the best-known high-probability bound for the multi-play setting by O( √ m). Our results are guar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(14 citation statements)
references
References 35 publications
0
14
0
Order By: Relevance
“…We study the multi-armed bandit problem in an online setting, where we sequentially operate on a stream of observations from an adversarial environment [54], i.e., we have no statistical assumptions on the loss sequence. To this end, we investigate the multi-armed bandit problem from a competitive algorithm perspective [13], [55]- [60].…”
Section: B Adversarial Multi-armed Bandit Problemmentioning
confidence: 99%
“…We study the multi-armed bandit problem in an online setting, where we sequentially operate on a stream of observations from an adversarial environment [54], i.e., we have no statistical assumptions on the loss sequence. To this end, we investigate the multi-armed bandit problem from a competitive algorithm perspective [13], [55]- [60].…”
Section: B Adversarial Multi-armed Bandit Problemmentioning
confidence: 99%
“…Several algorithms have been introduced to solve the CSB problem that arises from K = S n,k . This problem has been called the k-set problem [Combes et al, 2015], unordered slate [Kale et al, 2010] or bandits with multiple plays [Uchiya et al, 2010, Vural et al, 2019.…”
Section: Minimax Learning On the Capped Simplexmentioning
confidence: 99%
“…We use the EXP4.MP algorithm [Vural et al, 2019] for the p-player, which is a variation of EXP4 [Auer et al, 2002]. Each iteration of EXP4.MP has a computational cost of O(n log(n)) and a storage cost of O(n), with a high-probability regret bound of O( knT log(n/δ)).…”
Section: Exp4mpmentioning
confidence: 99%
See 1 more Smart Citation
“…These studies seldom consider the uncertainty of users' behaviours, so this paper introduces an online learning method called multi-armed bandits (MAB) to solve the problem. MAB has shown effectiveness and merit in air conditioning demand aggregation [16] and many other sequential decisionmaking problems containing uncertain/unknown behavioural factors [17][18][19][20][21][22][23][24][25][26][27]. In reference [28], an adversarial MAB framework is applied to learn the signal response of thermal control loads for demand response in real-time.…”
Section: Introductionmentioning
confidence: 99%