Bandit Algorithms

Lattimore, Tor; Szepesvári, Csaba

doi:10.1017/9781108571401

Cited by 1,184 publications

(1,387 citation statements)

References 167 publications

Supporting

Mentioning

1,370

Contrasting

Order By: Relevance

“…where the division of δ by n is in accordance with a union bound over the n arms. 1 A zero-mean random variable Z is sub-Gaussian with parameter σ if E[e λZ ] ≤ exp λ 2 σ 2…”

Section: A Auxiliary Resultsmentioning

confidence: 99%

“…The literature on theory and algorithms for MAB problems is extensive; see [1], [7] for recent overviews. One of the main defining features of such problems is the distinction between stochastic vs. adversarial rewards; this paper focuses exclusively on the former.…”

mentioning

confidence: 99%

“…Corollary 1. (Confidence bounds) If the arm reward distributions satisfy the conditions of Lemma 1, then for any ∈ (0, 1) and δ ∈ 0, 1 e log(1 + ) , it holds with probability at least 1…”

mentioning

confidence: 99%

See 2 more Smart Citations

Overlapping Multi-Bandit Best Arm Identification

Scarlett

Bogunovic²,

Cevher³

2019

2019 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

In the multi-armed bandit literature, the multi-bandit best-arm identification problem consists of determining each best arm in a number of disjoint groups of arms, with as few total arm pulls as possible. In this paper, we introduce a variant of the multi-bandit problem with overlapping groups, and present two algorithms for this problem based on successive elimination and lower/upper confidence bounds (LUCB). We bound the number of total arm pulls required for high-probability best-arm identification in every group, and we complement these bounds with a near-matching algorithm-independent lower bound. In addition, we show that a specific choice of the groups recovers the top-k ranking problem. I. INTRODUCTION The multi-armed bandit (MAB) problem [1] provides a versatile framework for sequentially searching for high-reward actions, with applications including clinical trials [2], online advertising [3], adaptive routing [4], and portfolio design [5]. A variation of the MAB problem known as multi-bandit best-arm identification consists of finding the best arm in each of a number of separate groups of arms, while pulling the minimal total number of arms possible [6]. As a motivating example, consider a scenario where each arm corresponds to a product, and pulling an arm corresponds to testing how much it is liked by some user(s). Then the multi-bandit problem corresponds to searching for the top products among multiple separate types (e.g., TV, phone, music player, etc.). Consider a variation of this example in which we not only want to find the top product of each type, but also the top products among several overlapping categories, e.g., top product under $100, top product from each brand name, top newly-released product, and so on. This motivates the overlapping multi-bandit best arm identification problem (or overlapping multi-bandit problem for short), which we introduce and study in this paper. In a nutshell, we seek to find each best arm in a number of overlapping groups using as few total arm pulls as possible; see Section II for a formal description. Beyond the preceding example, the consideration of overlapping groups is of considerable interest when arms correspond to users, since categories such as gender, age, marital status, etc. invariably exhibit overlap.

show abstract

“…where the division of δ by n is in accordance with a union bound over the n arms. 1 A zero-mean random variable Z is sub-Gaussian with parameter σ if E[e λZ ] ≤ exp λ 2 σ 2…”

Section: A Auxiliary Resultsmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Overlapping Multi-Bandit Best Arm Identification

Scarlett

Bogunovic²,

Cevher³

2019

2019 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

show abstract

“…Let ρ(t) be the maximum instantaneous estimation error of the multi-objective reward in trial t, where ρ(t) = max i∈ [I],a∈A |r i a (t)−r i a (t)|. Based on existing studies [Lattimore and Szepesvári, 2018] we can show that ρ(t) is bounded with high probability. When the instantaneous error in estimating multi-objective rewards is bounded, we can upper bound the regret of a multi-armed bandit instance running in a specific rank as O I…”

Section: Discussionmentioning

confidence: 99%

Learning Multi-Objective Rewards and User Utility Function in Contextual Bandits for Personalized Ranking

Wanigasekara

Liang

Goh

et al. 2019

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

This paper tackles the problem of providing users with ranked lists of relevant search results, by incorporating contextual features of the users and search results, and learning how a user values multiple objectives. For example, to recommend a ranked list of hotels, an algorithm must learn which hotels are the right price for users, as well as how users vary in their weighting of price against the location. In our paper, we formulate the context-aware, multi-objective, ranking problem as a Multi-Objective Contextual Ranked Bandit (MOCR-B). To solve the MOCR-B problem, we present a novel algorithm, named Multi-Objective Utility-Upper Confidence Bound (MOU-UCB). The goal of MOU-UCB is to learn how to generate a ranked list of resources that maximizes the rewards in multiple objectives to give relevant search results. Our algorithm learns to predict rewards in multiple objectives based on contextual information (combining the Upper Confidence Bound algorithm for multi-armed contextual bandits with neural network embeddings), as well as learns how a user weights the multiple objectives. Our empirical results reveal that the ranked lists generated by MOU-UCB lead to better click-through rates, compared to approaches that do not learn the utility function over multiple reward objectives.

show abstract

“…An excellent book on multi-armed bandits, Lattimore and Szepesvári (2019), will appear later this year. This book is much larger than ours; it provides a deeper treatment for a number of topics, and omits a few others.…”

Section: Prefacementioning

confidence: 99%