Naumaan Nayyar scite author profile

We consider the problem of distributed online learning with multiple players in multi-armed bandits (MAB) models. Each player can pick among multiple arms. When a player picks an arm, it gets a reward. Any other communication between the users is costly and will add to the regret. We propose an online index-based distributed learning policy called dUCB 4 algorithm that trades off exploration v. exploitation in the right way, and achieves expected regret that grows at most as near-O(log 2 T ). The motivation comes from opportunistic spectrum access by multiple secondary users in cognitive radio networks wherein they must pick among various wireless channels that look different to different users. This is the first distributed learning algorithm for multi-player MABs to the best of our knowledge. Index TermsDistributed adaptive control, multi-armed bandit, online learning, multi-agent systems.

show abstract

On a restless multi-armed bandit problem with non-identical arms

Nayyar

Gai

Krishnamachari

2011

View full text Add to dashboard Cite

We consider the following learning problem motivated by opportunistic spectrum access in cognitive radio networks. There are N independent Gilbert-Elliott channels with possibly non-identical transition matrices. It is desired to have an online policy to maximize the long-term expected discounted reward from accessing one channel at each time dynamically. While there is a stream of recent results on this problem when the channels are identical, much less is known for the harder case of non-identical channels. We provide the first characterization of the structure of the optimal policy for this problem when the channels can be non-identical, in the Bayesian case (when the transition matrices are known). We also provide the first provably efficient learning algorithm for a non-Bayesian version of this problem (when the transition matrices are unknown). Specifically, for the special case of two positively correlated channels, we use the structure we identify to develop a novel mapping to a different multi-armed bandit with countably-infinite arms, in which each arm corresponds to a threshold-based policy. Using this mapping, we propose a policy that achieves near-logarithmic regret for this problem with respect to an ǫ-optimal solution.

show abstract

Multi-player multi-armed bandits: Decentralized learning with IID rewards

Kalathil

Nayyar

Jain

2012

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Naumaan Nayyar

Decentralized Learning for Multiplayer Multiarmed Bandits

On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits

Decentralized learning for multi-player multi-armed bandits

On a restless multi-armed bandit problem with non-identical arms

Multi-player multi-armed bandits: Decentralized learning with IID rewards

Contact Info

Product

Resources

About