Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

Chen, Tianrui; Gangrade, Aditya; Saligrama, Venkatesh

doi:10.48550/arxiv.2204.00706

Cited by 1 publication

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The bound above behaves inversely with respect to the three gaps, as well as with respect to ǫ. The following lower bound argues, via a reduction to the safe multi-armed bandit problem [CGS22], that the dependence on min(∆ I , Γ I ) in the above is tight for consistent algorithms ( §D.5).…”

Section: Notementioning

confidence: 99%

“…Here, each constraint is associated with a notion of regret S i T = a i , x t − α i , and the overall regret is measured as max((max i S i T ), θ, x * − x t ). As noted by Pacchiano et al [PGBJ21] and Chen et al [CGS22], the main disadvantage of this formulation from our perspective arises from the fact that constraint violations are aggregated. This functionally means that it is okay for effective algorithms to alternate between actions that have large reward & poor safety, and actions that have poor reward but good safety.…”

Section: A a More Detailed Look At Related Workmentioning

confidence: 99%

“…This functionally means that it is okay for effective algorithms to alternate between actions that have large reward & poor safety, and actions that have poor reward but good safety. Indeed, this behaviour is exhibited quite explicitly in prior work [CGS22]. Such behaviour intrinsically does not model scenarios such as clinical trials or engineering design, since in these settings the interest is in maintaining the safety for most instances.…”

Section: A a More Detailed Look At Related Workmentioning

confidence: 99%

“…While we note that doubly optimistic approaches have previously appeared in the context of aggregate constraints [e.g. AD14] and for safe multi-armed bandits [CGS22], ours is the first work employing a doubly-optimistic approach in per-round safety constrained linear bandits 1 .…”

Section: Introductionmentioning

confidence: 97%

“…Prior works in this context can be grouped into two categories-those that deal with aggregated constraints and other more recent works that account for per-round constraint. As noted in recent work [PGBJ21;CGS22] the per-round constraint requirement is stringent, and prior strategies that utilize aggregate constraints for this problem [BKS13; BLS14; AD14; ADL16; AD16] are not compatible. On the other hand, prior works accounting for per-round safety exclusively rely upon optimistic-pessimistic strategies to ensure that these per-round constraints are met with high probability for each round [AAT19; MAAT21; PGBJ21].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Doubly Optimistic Strategy for Safe Linear Bandits

Chen¹,

Gangrade²,

Saligrama³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a doubly optimistic strategy for the safe-linear-bandit problem, DOSLB. The safe linear bandit problem is to optimise an unknown linear reward whilst satisfying unknown round-wise safety constraints on actions, using stochastic bandit feedback of reward and safety-risks of actions. In contrast to prior work on aggregated resource constraints, our formulation explicitly demands control on roundwise safety risks.Unlike existing optimistic-pessimistic paradigms for safe bandits, DOSLB exercises supreme optimism, using optimistic estimates of reward and safety scores to select actions. Yet, and surprisingly, we show that DOSLB rarely takes risky actions, and obtains Õ(d √ T ) regret, where our notion of regret accounts for both inefficiency and lack of safety of actions. Specialising to polytopal domains, we first notably show that the √ T -regret bound cannot be improved even with large gaps, and then identify a slackened notion of regret for which we show tight instancedependent O(log 2 T ) bounds. We further argue that in such domains, the number of times an overly risky action is played is also bounded as O(log 2 T ). j Preprint. Under review.

show abstract

Section: Notementioning

confidence: 99%

Section: A a More Detailed Look At Related Workmentioning

confidence: 99%

Section: A a More Detailed Look At Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 97%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations