Minimax Optimal Algorithms for Adversarial Bandit Problem With Multiple Plays

Vural, N. Mert; Gokcesu, Hakan; Gokcesu, Kaan; Kozat, Süleyman S.

doi:10.1109/tsp.2019.2928952

Cited by 14 publications

(14 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We study the multi-armed bandit problem in an online setting, where we sequentially operate on a stream of observations from an adversarial environment [54], i.e., we have no statistical assumptions on the loss sequence. To this end, we investigate the multi-armed bandit problem from a competitive algorithm perspective [13], [55]- [60].…”

Section: B Adversarial Multi-armed Bandit Problemmentioning

confidence: 99%

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Gokcesu¹,

Gokcesu²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We study the adversarial multi-armed bandit problem and create a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses. We study the expected performance of our algorithm against a generic competition class, which makes it applicable for a wide variety of problem scenarios. Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary arm selection sequences, which is the difference between our losses and a competing loss sequence. The competition class can be designed to include fixed arm selections, switching bandits, contextual bandits, or any other competition of interest. The sequences in the competition class are generally determined by the specific application at hand and should be designed accordingly. Our algorithm neither uses nor needs any preliminary information about the loss sequences and is completely online. Its performance bounds are the second order bounds in terms of sum of the squared losses, where any affine transform of the losses has no effect on the normalized regret.

show abstract

Section: B Adversarial Multi-armed Bandit Problemmentioning

confidence: 99%

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Gokcesu¹,

Gokcesu²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Several algorithms have been introduced to solve the CSB problem that arises from K = S n,k . This problem has been called the k-set problem [Combes et al, 2015], unordered slate [Kale et al, 2010] or bandits with multiple plays [Uchiya et al, 2010, Vural et al, 2019.…”

Section: Minimax Learning On the Capped Simplexmentioning

confidence: 99%

“…We use the EXP4.MP algorithm [Vural et al, 2019] for the p-player, which is a variation of EXP4 [Auer et al, 2002]. Each iteration of EXP4.MP has a computational cost of O(n log(n)) and a storage cost of O(n), with a high-probability regret bound of O( knT log(n/δ)).…”

Section: Exp4mpmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Online-Bandit Strategies for Minimax Learning Problems

Roux,

Wirth,

Pokutta

et al. 2021

Preprint

View full text Add to dashboard Cite

Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning Duchi, 2016, Curi et al., 2020] or learning with non-standard aggregated losses [Shalev-Shwartz and Wexler, 2016, Fan et al., 2017]. More specifically, these problems are convex-linear problems where the minimization is carried out over the model parameters w ∈ W and the maximization over the empirical distribution p ∈ K of the training set indexes, where K is the simplex or a subset of it. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm. We argue that the efficiency of such approaches critically depends on the structure of K and propose two properties of K that facilitate designing efficient algorithms. We focus on a specific family of sets S n,k encompassing various learning applications and provide high-probability convergence guarantees to the minimax values.

show abstract

“…These studies seldom consider the uncertainty of users' behaviours, so this paper introduces an online learning method called multi-armed bandits (MAB) to solve the problem. MAB has shown effectiveness and merit in air conditioning demand aggregation [16] and many other sequential decisionmaking problems containing uncertain/unknown behavioural factors [17][18][19][20][21][22][23][24][25][26][27]. In reference [28], an adversarial MAB framework is applied to learn the signal response of thermal control loads for demand response in real-time.…”

Section: Introductionmentioning

confidence: 99%

A dynamic distributed energy storage control strategy for providing primary frequency regulation using multi‐armed bandits method

Sun

Zhao

Zhang

et al. 2021

IET Generation Trans & Dist

View full text Add to dashboard Cite

Maintaining frequency stability is a crucial but challenging task for the stable operation of a power system. The distributed energy storage (DES) can charge or discharge for both upward and downward frequency regulation, exploring and effectively using the regulation capabilities will provide a strong backup for frequency regulation. Here, a dynamic DES control strategy for providing primary frequency regulation is proposed. The different behaviours of storage owners are considered when they respond to the regulation requests from the aggregator. This kind of uncertainty would lead to a mismatch between the final aggregation result and the expected target. Hence, the multi-armed bandit approach is applied to learn users' response behaviour and select the optimal set of users to mitigate the mismatch. Case studies on the IEEE RTS 24-bus system demonstrate that the proposed method's mismatch between the actual aggregation result and the regulation target is less than half as much as the conventional method, and it can restore the frequency 20 events earlier than the traditional method.

show abstract

Minimax Optimal Algorithms for Adversarial Bandit Problem With Multiple Plays

Cited by 14 publications

References 35 publications

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Efficient Online-Bandit Strategies for Minimax Learning Problems

A dynamic distributed energy storage control strategy for providing primary frequency regulation using multi‐armed bandits method

Contact Info

Product

Resources

About