Generalized Linear Bandits with Safety Constraints

Amani, Sanae; Alizadeh, Mahnoosh; Thrampoulidis, Christos

doi:10.1109/icassp40776.2020.9054063

Cited by 7 publications

(16 citation statements)

References 16 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The assumption warrants need for a safe starting point which is readily available in most practical problems of interest. Similar assumptions can be found in previous literature on safe linear bandits [53], [54], safe convex and non-convex optimization [55], [56], and safe online convex optimization [15].…”

Section: A Assumptionsmentioning

confidence: 62%

“…d) IV) Safe Online Optimization:: Safe optimization is a fairly nascent field with only a few works studying pertime safety in optimization problems. In [53], [54] study the problem of safe linear bandits giving O(log(T )…”

Section: B Contributionsmentioning

confidence: 99%

“…T ) regret with no constraint violation, albeit under an assumption that a lower bound on the distance between the optimal action and safe set's boundary is known. Without knowledge such a lower bound, [53] show O(log(T )T 2/3 ) regret. Safe convex and non-convex optimization is studied in [55], [56].…”

mentioning

confidence: 99%

See 2 more Smart Citations

On Online Optimization: Dynamic Regret Analysis of Strongly Convex and Smooth Problems

Chang

Shahrampour

2021

AAAI

View full text Add to dashboard Cite

The regret bound of dynamic online learning algorithms is often expressed in terms of the variation in the function sequence (V_T) and/or the path-length of the minimizer sequence after T rounds. For strongly convex and smooth functions, Zhang et al. (2017) establish the squared path-length of the minimizer sequence (C*_{2,T}) as a lower bound on regret. They also show that online gradient descent (OGD) achieves this lower bound using multiple gradient queries per round. In this paper, we focus on unconstrained online optimization. We first show that a preconditioned variant of OGD achieves O(min{C*_T,C*_{2,T}}) with one gradient query per round (C*_T refers to the normal path-length). We then propose online optimistic Newton (OON) method for the case when the first and second order information of the function sequence is predictable. The regret bound of OON is captured via the quartic path-length of the minimizer sequence (C*_{4,T}), which can be much smaller than C*_{2,T}. We finally show that by using multiple gradients for OGD, we can achieve an upper bound of O(min{C*_{2,T},V_T}) on regret.

show abstract

Section: A Assumptionsmentioning

confidence: 62%

Section: B Contributionsmentioning

confidence: 99%

See 1 more Smart Citation

On Online Optimization: Dynamic Regret Analysis of Strongly Convex and Smooth Problems

Chang

Shahrampour

2021

AAAI

View full text Add to dashboard Cite

show abstract

“…In contrast to our setting, the paper [Usmanova et al, 2019] requires multiple measurements of the constraint at each round of the algorithm. Other closely related works of [Amani et al, 2019, Amani et al, 2020 study the problem of safe linear and generalized linear stochastic bandit where the constraint and loss functions depend linearly (directly or via a link function) on an unknown parameter. In fact, our algorithm can be seen as an extension of Safe-LUCB proposed by [Amani et al, 2019] to safe GPs.…”

Section: Related Workmentioning

confidence: 99%

Regret Bounds for Safe Gaussian Process Bandit Optimization

Amani¹,

Alizadeh²,

Thrampoulidis³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Many applications require a learner to make sequential decisions given uncertainty regarding both the system's payoff function and safety constraints. In safety-critical systems, it is paramount that the learner's actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constraint functions are sampled from Gaussian Processes (GPs) first considered in [Srinivas et al., 2010]. We develop a safe variant of GP-UCB called SGP-UCB, with necessary modifications to respect safety constraints at every round. The algorithm has two distinct phases. The first phase seeks to estimate the set of safe actions in the decision set, while the second phase follows the GP-UCB decision rule. Our main contribution is to derive the first sub-linear regret bounds for this problem. We numerically compare SGP-UCB against existing safe Bayesian GP optimization algorithms.

show abstract

“…Perhaps closest to our work is that of the setting in which there exist two distributions, one over rewards for actions, and one over costs. The goal is to maximize the expected reward, while ensuring that the expected cost of the selected action is below a certain threshold (Amani et al, 2019;Moradipari et al, 2021;Pacchiano et al, 2021). Crucially none of these frameworks allow for observing the constrained only on an uncontrolled subset of the rounds, which is a key challenge of the CBUS setting.…”

Section: Introductionmentioning

confidence: 99%

Leveraging User-Triggered Supervision in Contextual Bandits

Agarwal¹,

Gentile²,

Marinov³

2023

Preprint

View full text Add to dashboard Cite

We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. Such an interaction arises, for example, in text prediction or autocompletion settings, where a poor suggestion is simply ignored and the user enters the desired text instead. Crucially, this extra feedback is usertriggered on only a subset of the contexts. We develop a new framework to leverage such signals, while being robust to their biased nature. We also augment standard CB algorithms to leverage the signal, and show improved regret guarantees for the resulting algorithms under a variety of conditions on the helpfulness of and bias inherent in this feedback.

show abstract

Generalized Linear Bandits with Safety Constraints

Cited by 7 publications

References 16 publications

On Online Optimization: Dynamic Regret Analysis of Strongly Convex and Smooth Problems

On Online Optimization: Dynamic Regret Analysis of Strongly Convex and Smooth Problems

Regret Bounds for Safe Gaussian Process Bandit Optimization

Leveraging User-Triggered Supervision in Contextual Bandits

Contact Info

Product

Resources

About