Multi-Armed Bandits: Theory and Applications to Online Learning in Networks

Zhao, Qing

doi:10.2200/s00941ed2v01y201907cnt022

Cited by 24 publications

(20 citation statements)

References 137 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we review related literature, focusing on relatively recent work. We refer the reader to the recent monograph [42] on multi-armed bandits, which discusses both the MABP and the MARBP and their widespread applications.…”

Section: Review Of Related Literaturementioning

confidence: 99%

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Niño‐Mora

2020

Mathematics

View full text Add to dashboard Cite

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.

show abstract

Section: Review Of Related Literaturementioning

confidence: 99%

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Niño‐Mora

2020

Mathematics

View full text Add to dashboard Cite

show abstract

“…To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.1 Zeroth-order feedback signifies observations from f in contrast to first-order feedback which refers to observations from gradient of f as e.g. in stochastic gradient descent [see, e.g., Agarwal et al, 2011, Vakili andZhao, 2019].Preprint. Under review.…”

mentioning

confidence: 92%

Optimal Order Simple Regret for Gaussian Process Bandits

Vakili¹,

Bouziani²,

Jalali³

et al. 2021

Preprint

View full text Add to dashboard Cite

Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function f . The problem can be cast as a Gaussian Process (GP) bandit where f lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When N is the number of exploration trials and γ N is the maximal information gain, we prove an Õ( γ N /N ) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.1 Zeroth-order feedback signifies observations from f in contrast to first-order feedback which refers to observations from gradient of f as e.g. in stochastic gradient descent [see, e.g., Agarwal et al., 2011, Vakili andZhao, 2019].Preprint. Under review.

show abstract

“…We develop a controlled testing methodology to control the spread of the COVID-19 pandemic based on a large-scale stochastic model. Controlled sensing , a.k.a active sensing, is based on classic sequential experimental design theory 17 , 18 , and has attracted growing attention in recent years in various hypothesis testing and dynamic search problems 19 – 23 . Controlled sensing policies, have also been used to identify influence in social networks 24 , as well as to learn the dynamics in general networks 25 .…”

Section: Introductionmentioning

confidence: 99%

Suppressing the impact of the COVID-19 pandemic using controlled testing and isolation

Cohen

Leshem

2021

Sci Rep

View full text Add to dashboard Cite

The Corona virus disease has significantly affected lives of people around the world. Existing quarantine policies led to large-scale lock-downs because of the slow tracking of the infection paths, and indeed we see new waves of the disease. This can be solved by contact tracing combined with efficient testing policies. Since the number of daily tests is limited, it is crucial to exploit them efficiently to improve the outcome of contact tracing (technological or human-based epidemiological investigations). We develop a controlled testing framework to achieve this goal. The key is to test individuals with high probability of being infected to identify them before symptoms appear. These probabilities are updated based on contact tracing and test results. We demonstrate that the proposed method could reduce the quarantine and morbidity rates compared to existing methods by up to a 50%. The results clearly demonstrate the necessity of accelerating the epidemiological investigations by using technological contact tracing. Furthermore, proper use of the testing capacity using the proposed controlled testing methodology leads to significantly improved results under both small and large testing capacities. We also show that for small new outbreaks controlled testing can prevent the large spread of new waves. Author contributions statement: The authors contributed equally to this work, including conceptualization, analysis, methodology, software, and drafting the work.

show abstract

Multi-Armed Bandits: Theory and Applications to Online Learning in Networks

Cited by 24 publications

References 137 publications

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Optimal Order Simple Regret for Gaussian Process Bandits

Suppressing the impact of the COVID-19 pandemic using controlled testing and isolation

Contact Info

Product

Resources

About