Stochastic Bandits with Linear Constraints

Pacchiano, Aldo; Ghavamzadeh, Mohammad; Bartlett, Peter; Jiang, Heinrich

doi:10.48550/arxiv.2006.10185

Cited by 2 publications

(20 citation statements)

References 14 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the computational complexity of the dual component depends on the number of constraints, but is independent of sizes of the contextual space, the action space, and the feature space, the overall computational complexity of our algorithm is similar to that of LinUCB in the unconstrained setting. Since our anytime cumulative constraint (2) is most similar to an anytime policy constraint in [31], we first compare our algorithm with OPLB proposed in [31]. OPLB needs to construct a safe policy set in each round.…”

Section: Main Contributionsmentioning

confidence: 99%

“…it imposes a cumulative constraint in every round. This anytime cumulative constraint is most similar to an anytime policy constraint in [31] because the average cost of a policy is close to its mean after the policy has been applied for many rounds and the process converges, so can be viewed as a cumulative constraint on actions over many rounds (like ours). Furthermore, when our anytime cumulative constraint (2) is satisfied, our learner guarantees that the time-average cost is below a threshold in every round.…”

Section: Introductionmentioning

confidence: 97%

“…In general, the learner has more flexibility in the earlier rounds, close to that in the unconstrained setting. Another formulation is anytime constraints, which either require the expected cost of the action taken in each round is lower than a threshold [6] or the expected cost of the policy in each round is lower than a threshold [31]. We call them anytime action constraints and anytime policy constraints, respectively.…”

Section: Introductionmentioning

confidence: 99%

“…So the overall computational complexity of our algorithm is similar to the linear UCB for unconstrained stochastic linear bandits. * A preliminary version of this paper that considers the traditional multi-armed bandits can be found in [27].1 General constraints in this paper refer to that we do not assume the cost functions in the constraints are linear as in [6,31].…”

mentioning

confidence: 99%

“…1 General constraints in this paper refer to that we do not assume the cost functions in the constraints are linear as in [6,31].…”

mentioning

confidence: 99%

See 4 more Smart Citations

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Liu¹,

Li²,

Shi³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper considers stochastic linear bandits with general constraints. 1 The objective is to maximize the expected cumulative reward over horizon T subject to a set of constraints in each round τ ď T . We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects. First, the algorithm yields Õ ´´K 1.5 δ 2`d¯? τ ¯(pseudo) regret in round τ ď T, where K is the number of constraints, d is the dimension of the reward feature space, and δ is a Slater's constant; and zero constraint violation in any round τ ą τ 1 , where τ 1 is independent of horizon T. Second, the algorithm is computationally efficient. Our algorithm is based on the primal-dual approach in optimization, and includes two components. The primal component is similar to unconstrained stochastic linear bandits (our algorithm uses the linear upper confidence bound algorithm (LinUCB)). The computational complexity of the dual component depends on the number of constraints, and is independent of sizes of the contextual space, the action space, and even the feature space. So the overall computational complexity of our algorithm is similar to the linear UCB for unconstrained stochastic linear bandits. * A preliminary version of this paper that considers the traditional multi-armed bandits can be found in [27].1 General constraints in this paper refer to that we do not assume the cost functions in the constraints are linear as in [6,31].

show abstract

Section: Main Contributionsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 97%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

“…1 General constraints in this paper refer to that we do not assume the cost functions in the constraints are linear as in [6,31].…”

mentioning

confidence: 99%

See 3 more Smart Citations

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Liu¹,

Li²,

Shi³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Decentralized Multi-Agent Linear Bandits with Safety Constraints

Amani

Thrampoulidis

2021

AAAI

View full text Add to dashboard Cite

We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to efficiently solve a linear bandit-optimization problem over a d-dimensional space. For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network. At each round of the algorithm each agent chooses its actions following an upper confidence bound (UCB) strategy and agents share information with their immediate neighbors through a carefully designed consensus procedure that repeats over cycles. Our analysis adjusts the duration of these communication cycles ensuring near-optimal regret performance O(d \log{NT}\sqrt{NT}) at a communication rate of O(dN^2) per round. The structure of the network affects the regret performance via a small additive term – coined the regret of delay – that depends on the spectral gap of the underlying graph. Notably, our results apply to arbitrary network topologies without a requirement for a dedicated agent acting as a server. In consideration of situations with high communication cost, we propose RC-DLUCB: a modification of DLUCB with rare communication among agents. The new algorithm trades off regret performance for a significantly reduced total communication cost of O(d^3N^5/2) over all T rounds. Finally, we show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits. For the recently studied problem of linear bandits with unknown linear safety constraints, we propose the first safe decentralized algorithm. Our study contributes towards applying bandit techniques in safety-critical distributed systems that repeatedly deal with unknown stochastic environments. We present numerical simulations for various network topologies that corroborate our theoretical findings.

show abstract

Stochastic Bandits with Linear Constraints

Cited by 2 publications

References 14 publications

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Decentralized Multi-Agent Linear Bandits with Safety Constraints

Contact Info

Product

Resources

About