Regret Bounds for Safe Gaussian Process Bandit Optimization

Amani, Sanae; Alizadeh, Mahnoosh; Thrampoulidis, Christos

doi:10.1109/isit45174.2021.9518176

Cited by 8 publications

(6 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) variance, the BQ noisy lower bound is known to be ⌦(T 1 2 ) (Plaskota 1996;Cai, Lam, and Scarlett 2023), and the BO noisy lower bound is known as ✏ = ⌦(T ⌫ 2⌫+d ) (Scarlett, Bogunovic, and Cevher 2017;Cai and Scarlett 2021). To compare against these, for the first dot point in Theorem 2, we consider the following specific c values:…”

Section: Lower Boundsmentioning

confidence: 99%

Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization

Cai,

Scarlett

2024

AAAI

View full text Add to dashboard Cite

In this paper, we study the problem of estimating the normalizing constant through queries to the black-box function f, which is the integration of the exponential function of f scaled by a problem parameter lambda. We assume f belongs to a reproducing kernel Hilbert space (RKHS), and show that to estimate the normalizing constant within a small relative error, the level of difficulty depends on the value of lambda: When lambda approaches zero, the problem is similar to Bayesian quadrature (BQ), while when lambda approaches infinity, the problem is similar to Bayesian optimization (BO). More generally, the problem varies between BQ and BO. We find that this pattern holds true even when the function evaluations are noisy, bringing new aspects to this topic. Our findings are supported by both algorithm-independent lower bounds and algorithmic upper bounds, as well as simulation studies conducted on a variety of benchmark functions.

show abstract

Section: Lower Boundsmentioning

confidence: 99%

Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization

Cai,

Scarlett

2024

AAAI

View full text Add to dashboard Cite

show abstract

“…This is a quite mild assumption since it only requires that one can find a probability distribution over the set of actions under which the expected cost is less than a strictly negative value. This is in sharp constraint to existing KB algorithms for hard constraints that typically require the existence of an initial safe action (Sui et al, 2018;Amani et al, 2020).…”

Section: Problem Formulation and Preliminariesmentioning

confidence: 99%

“…To this end, there have been exciting recent advances in the theoretical analysis of constrained kernelized bandits. In particular, (Sui et al, 2015;Berkenkamp et al, 2016;Sui et al, 2018) propose algorithms with convergence guarantees, while (Amani et al, 2020), to the best our knowledge, is the first work that establishes regret bounds for their developed algorithm, although under the Bayesian-type 1 setting. These algorithms mainly focus on KB with a hard constraint such as safety, i.e., the selected action in each round needs to satisfy the constraint with a high probability.…”

Section: Introductionmentioning

confidence: 99%

“…In both examples, one fundamental question is whether the nature of soft constraints can be utilized to design constrained KB (CKB) algorithms with the same complexity as the unconstrained case while attaining a better reward performance compared to the hard constraints. Furthermore, existing provably efficient algorithms (Sui et al, 2015;Berkenkamp et al, 2016;Sui et al, 2018;Amani et al, 2020) largely build on upper confidence bound (UCB) exploration, which often has inferior empirical performance compared to Thompson sampling (TS) exploration. Hence, another key question is whether one can design provably efficient CKB algorithms with general explorations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On Kernelized Multi-Armed Bandits with Constraints

Zhou¹,

Jiang²

2022

Preprint

View full text Add to dashboard Cite

We study a stochastic bandit problem with a general unknown reward function and a general unknown constraint function. Both functions can be non-linear (even non-convex) and are assumed to lie in a reproducing kernel Hilbert space (RKHS) with a bounded norm. This kernelized bandit setup strictly generalizes standard multi-armed bandits and linear bandits. In contrast to safety-type hard constraints studied in prior works, we consider soft constraints that may be violated in any round as long as the cumulative violations are small, which is motivated by various practical applications. Our ultimate goal is to study how to utilize the nature of soft constraints to attain a finer complexity-regret-constraint trade-off in the kernelized bandit setting. To this end, leveraging primal-dual optimization, we propose a general framework for both algorithm design and performance analysis. This framework builds upon a novel sufficient condition, which not only is satisfied under general exploration strategies, including upper confidence bound (UCB), Thompson sampling (TS), and new ones based on random exploration, but also enables a unified analysis for showing both sublinear regret and sublinear or even zero constraint violation. We demonstrate the superior performance of our proposed algorithms via numerical experiments based on both synthetic and real-world datasets. Along the way, we also make the first detailed comparison between two popular methods for analyzing constrained bandits and Markov decision processes (MDPs) by discussing the key difference and some subtleties in the analysis, which could be of independent interest to the communities.

show abstract

“…The only known results on safe exploration in multi-armed bandits address the case with continuous, convex arm spaces and convex constraints. The learner can converge to the optimal solution in these settings without violating the constraints [16,17]. Conversely, the case with discrete and/or non-convex arm spaces or non-convex constraints, such as ours, is unexplored in the literature so far.…”

Section: Related Workmentioning

confidence: 99%

Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty

Castiglioni¹,

Nuara²,

Romano³

et al. 2022

Preprint

View full text Add to dashboard Cite

In online marketing, the advertisers' goal is usually a tradeoff between achieving high volumes and high profitability. The companies' business units customarily address this tradeoff by maximizing the volumes while guaranteeing a lower bound to the Return On Investment (ROI). Technically speaking, such a task can be naturally modeled as a combinatorial optimization problem subject to ROI and budget constraints to be solved online since the parameter values are uncertain and need to be estimated during the sequential arrival of data. In this picture, the uncertainty over the constraints' parameters plays a crucial role. Indeed, these constraints can be arbitrarily violated during the learning process due to an uncontrolled algorithms' exploration, and such violations represent one of the major obstacles to the adoption of automatic techniques in real-world applications as often considered unacceptable risks by the advertisers. Thus, to make humans trust online learning tools, controlling the algorithms' exploration so as to mitigate the risk and provide safety guarantees during the entire learning process is of paramount importance. In this paper, we study the nature of both the optimization and learning problems. In particular, when focusing on the optimization problem without uncertainty, we show that it is inapproximable within any factor unless P = NP, and we provide a pseudo-polynomial-time

show abstract

Regret Bounds for Safe Gaussian Process Bandit Optimization

Cited by 8 publications

References 5 publications

Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization

Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization

On Kernelized Multi-Armed Bandits with Constraints

Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty

Contact Info

Product

Resources

About