On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits

Nayyar, Naumaan; Kalathil, Dileep; Jain, Rahul

doi:10.1109/tcns.2016.2635380

Cited by 50 publications

(68 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, the players collide with others in a specific pattern to convey their estimate of rewards or preference of channels. The dE3 algorithm in [15] employs Bertsekas auction algorithm which requires the users to exchange bids to win the channel of their preference via collisions. ESER [16] and M-ETC [17] algorithms allow users to exchanges the mean values they observe with others via collisions.…”

Section: Related Workmentioning

confidence: 99%

“…Such architecture can detect the presence of another user on their channel either by experiencing collision or sensing but cannot estimate the number of collided or sensed users. Furthermore, the architecture cannot sense multiple channels simultaneously making the orthogonalization and establishing coordination extremely challenging than in [5,15]. In Table I, we compare existing distributed algorithms with respect to various parameters.…”

Section: A Radio Modelsmentioning

confidence: 99%

“…The reward observed by a user on a channel under collision-free transmissions is assumed to be independently and identically distributed. The same setup is also considered in [5,15].…”

Section: Network Modelmentioning

confidence: 99%

“…Very few algorithms can be extended for dynamic adhoc networks and they assume global synchronization which means new users have complete knowledge about the status of the network. For example, in [6,8,15], algorithms exploit the full knowledge of network state and restarts at regular intervals to account for the dynamic users. However, requiring complete knowledge of the network state is restrictive as non-active users need to continuously sense the network without utilizing the energy efficient sleep mode.…”

Section: F Analysismentioning

confidence: 99%

See 3 more Smart Citations

Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks

Darak

Hanawal

2019

IEEE J. Select. Areas Commun.

View full text Add to dashboard Cite

Next generation networks are expected to be ultradense and aim to explore spectrum sharing paradigm that allows users to communicate in licensed, shared as well as unlicensed spectrum. Such ultra-dense networks will incur significant signaling load at base stations leading to a negative effect on spectrum and energy efficiency. To minimize signaling overhead, an adhoc approach is being considered for users communicating in the unlicensed and shared spectrums. For such users, decisions need to be completely decentralized as: 1) No communication between users and signaling from the base station is possible which necessitates independent channel selection at each user. A collision occurs when multiple users transmit simultaneously on the same channel, 2) Channel qualities may be heterogeneous, i.e., they are not same across all users, and moreover, are unknown, and 3) The network could be dynamic where users can enter or leave anytime. We develop a multi-armed bandit based distributed algorithm for static networks and extend it for the dynamic networks. The algorithms aim to achieve stable orthogonal allocation (SOC) in finite time and meet the above three constraints with two novel characteristics: 1) Low complexity narrowband radio compared to wideband radio in existing works, and 2) Epoch-less approach for dynamic networks. We establish convergence of our algorithms to SOC and validate via extensive simulation experiments.Index Terms-Multi-player multi-armed bandit, ad-hoc networks, dynamic networks, distributed learning.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: A Radio Modelsmentioning

confidence: 99%

“…The reward observed by a user on a channel under collision-free transmissions is assumed to be independently and identically distributed. The same setup is also considered in [5,15].…”

Section: Network Modelmentioning

confidence: 99%

Section: F Analysismentioning

confidence: 99%

See 2 more Smart Citations

Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks

Darak

Hanawal

2019

IEEE J. Select. Areas Commun.

View full text Add to dashboard Cite

show abstract

“…Our results suggest that the distributed learning dynamics in finite populations can be viewed as a novel distributed and approximate implementation of the MWU method. While parallelized implementations for solving multi-armed bandit problem exist (see, e.g., [2,25,28,33,36]), in such works each node explicitly maintains a weight vector on all options. The most distinctive aspect of the distributed MWU interpretation of the learning dynamics we consider is that no such memory is required -the weights are represented implicitly by the popularity of the various options, and the sampling and adopting processes require almost no memory.…”

Section: Related Workmentioning

confidence: 99%

A Distributed Learning Dynamics in Social Groups

Celis

Krafft

Vishnoi

2017

Proceedings of the ACM Symposium on Principles of Distributed Computing

View full text Add to dashboard Cite

We study a distributed learning process observed in human groups and other social animals. This learning process appears in settings in which each individual in a group is trying to decide over time, in a distributed manner, which option to select among a shared set of options. Specifically, we consider a stochastic dynamics in a group in which every individual selects an option in the following two-step process: (1) select a random individual and observe the option that individual chose in the previous time step, and (2) adopt that option if its stochastic quality was good at that time step. Various instantiations of such distributed learning appear in nature, and have also been studied in the social science literature. From the perspective of an individual, an attractive feature of this learning process is that it is a simple heuristic that requires extremely limited computational capacities. But what does it mean for the groupcould such a simple, distributed and essentially memoryless process lead the group as a whole to perform optimally? We show that the answer to this question is yes -this distributed learning is highly effective at identifying the best option and is close to optimal for the group overall. Our analysis also gives quantitative bounds that show fast convergence of these stochastic dynamics. We prove our result by first defining a (stochastic) infinite population version of these distributed learning dynamics and then combining its strong convergence properties along with its relation to the finite population dynamics. Prior to our work the only theoretical work related to such learning dynamics has been either in deterministic special cases or in the asymptotic setting. Finally, we observe that our infinite population dynamics is a stochastic variant of the classic multiplicative weights update (MWU) method. Consequently, we arrive at the following interesting converse: the learning dynamics on a finite population considered here can be viewed as a novel distributed and low-memory implementation of the classic MWU method. *

show abstract