2018
DOI: 10.1109/tcns.2016.2635380
|View full text |Cite
|
Sign up to set email alerts
|

On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
68
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 50 publications
(68 citation statements)
references
References 21 publications
0
68
0
Order By: Relevance
“…Specifically, the players collide with others in a specific pattern to convey their estimate of rewards or preference of channels. The dE3 algorithm in [15] employs Bertsekas auction algorithm which requires the users to exchange bids to win the channel of their preference via collisions. ESER [16] and M-ETC [17] algorithms allow users to exchanges the mean values they observe with others via collisions.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Specifically, the players collide with others in a specific pattern to convey their estimate of rewards or preference of channels. The dE3 algorithm in [15] employs Bertsekas auction algorithm which requires the users to exchange bids to win the channel of their preference via collisions. ESER [16] and M-ETC [17] algorithms allow users to exchanges the mean values they observe with others via collisions.…”
Section: Related Workmentioning
confidence: 99%
“…Such architecture can detect the presence of another user on their channel either by experiencing collision or sensing but cannot estimate the number of collided or sensed users. Furthermore, the architecture cannot sense multiple channels simultaneously making the orthogonalization and establishing coordination extremely challenging than in [5,15]. In Table I, we compare existing distributed algorithms with respect to various parameters.…”
Section: A Radio Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…Our results suggest that the distributed learning dynamics in finite populations can be viewed as a novel distributed and approximate implementation of the MWU method. While parallelized implementations for solving multi-armed bandit problem exist (see, e.g., [2,25,28,33,36]), in such works each node explicitly maintains a weight vector on all options. The most distinctive aspect of the distributed MWU interpretation of the learning dynamics we consider is that no such memory is required -the weights are represented implicitly by the popularity of the various options, and the sampling and adopting processes require almost no memory.…”
Section: Related Workmentioning
confidence: 99%