2012 IEEE 51st IEEE Conference on Decision and Control (CDC) 2012
DOI: 10.1109/cdc.2012.6426587
|View full text |Cite
|
Sign up to set email alerts
|

Decentralized learning for multi-player multi-armed bandits

Abstract: We consider the problem of distributed online learning with multiple players in multi-armed bandits (MAB) models. Each player can pick among multiple arms. When a player picks an arm, it gets a reward. Any other communication between the users is costly and will add to the regret. We propose an online index-based distributed learning policy called dUCB 4 algorithm that trades off exploration v. exploitation in the right way, and achieves expected regret that grows at most as near-O(log 2 T ). The motivation co… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
32
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(32 citation statements)
references
References 23 publications
0
32
0
Order By: Relevance
“…As a consequence, orthogonalizing the players over the best arms may not be the optimal allocation. In [22], Kalathil et al considered the case where arm ranks may be different across players. They proposed a decentralized policy that achieves regret under the i.i.d.…”
Section: Related Work On Rmabmentioning
confidence: 99%
“…As a consequence, orthogonalizing the players over the best arms may not be the optimal allocation. In [22], Kalathil et al considered the case where arm ranks may be different across players. They proposed a decentralized policy that achieves regret under the i.i.d.…”
Section: Related Work On Rmabmentioning
confidence: 99%
“…• In our problem setting, both noise-limited and interferencelimited transmission models are studied, and we do not impose any limitation on the interference pattern. This is in contrast with [16], [17] and [31], where the interference is either completely neglected or is limited to the neighboring users. This is important since in general channel allocation based on interference avoidance is suboptimal.…”
Section: B Our Contributionmentioning
confidence: 72%
“…This stands in contrast to [15], where the reward of each specific channel is assumed to be equal for all users, and only the availability is stochastic. • Unlike [18] and [31], our algorithm does not require information exchange.…”
Section: B Our Contributionmentioning
confidence: 96%
See 1 more Smart Citation
“…In [16], Kalathil et al studied the problem of distributed online learning with multiple players in multi-armed bandits and proposed an online index-based distributed learning policy. In [17], N-armed bandits have been applied in pay-per-click auctions for Internet advertising, while in [18] for truthful sponsored search auctions and in [19] for keywords selection by search-based advertising.…”
Section: Related Workmentioning
confidence: 99%