Stochastic bandits with pathwise constraints

Avner, Orly; Mannor, Shie

doi:10.1109/eeei.2012.6376912

Cited by 2 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multi-Armed Bandits (MABs) are a well-known framework in machine learning [7]. They succeed in capturing the trade-off between exploration and exploitation in sequential decision problems, and have been used in the context of learning in CRNs over the last few years [13], [5,6]. Classical bandit problems comprise an agent (user) repeatedly choosing a single option (arm) from a set of options whose characteristics are initially unknown, receiving a certain reward based on every choice.…”

Section: Multi-armed Banditsmentioning

confidence: 99%

Concurrent Bandits and Cognitive Radio Networks

Avner

Mannor

2014

Machine Learning and Knowledge Discovery in Databases

Self Cite

View full text Add to dashboard Cite

We consider the problem of multiple users targeting the arms of a single multi-armed stochastic bandit. The motivation for this problem comes from cognitive radio networks, where selfish users need to coexist without any side communication between them, implicit cooperation or common control. Even the number of users may be unknown and can vary as users join or leave the network. We propose an algorithm that combines an -greedy learning rule with a collision avoidance mechanism. We analyze its regret with respect to the systemwide optimum and show that sub-linear regret can be obtained in this setting. Experiments show dramatic improvement compared to other algorithms for this setting.

show abstract

Section: Multi-armed Banditsmentioning

confidence: 99%

Concurrent Bandits and Cognitive Radio Networks

Avner

Mannor

2014

Machine Learning and Knowledge Discovery in Databases

Self Cite

View full text Add to dashboard Cite

show abstract

“…Using MABs to model CRNs was first suggested in [11], in a rather straightforward manner -the channels of a communication network simply correspond to the arms of a MAB. An extension that also takes operational constraints into account appears in [12].…”

Section: The Crn-mab Frameworkmentioning

confidence: 99%

Multi-User Communication Networks: A Coordinated Multi-Armed Bandit Approach

Avner

Mannor

2019

IEEE/ACM Trans. Networking

Self Cite

View full text Add to dashboard Cite

Communication networks shared by many users are a widespread challenge nowadays. In this paper we address several aspects of this challenge simultaneously: learning unknown stochastic network characteristics, sharing resources with other users while keeping coordination overhead to a minimum. The proposed solution combines Multi-Armed Bandit learning with a lightweight signalling-based coordination scheme, and ensures convergence to a stable allocation of resources. Our work considers single-user level algorithms for two scenarios: an unknown fixed number of users, and a dynamic number of users. Analytic performance guarantees, proving convergence to stable marriage configurations, are presented for both setups. The algorithms are designed based on a system-wide perspective, rather than focusing on single user welfare. Thus, maximal resource utilization is ensured. An extensive experimental analysis covers convergence to a stable configuration as well as reward maximization. Experiments are carried out over a wide range of setups, demonstrating the advantages of our approach over existing state-of-the-art methods.

show abstract

Stochastic bandits with pathwise constraints

Cited by 2 publications

References 12 publications

Concurrent Bandits and Cognitive Radio Networks

Concurrent Bandits and Cognitive Radio Networks

Multi-User Communication Networks: A Coordinated Multi-Armed Bandit Approach

Contact Info

Product

Resources

About