2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952664
|View full text |Cite
|
Sign up to set email alerts
|

Multi-armed bandits in multi-agent networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 59 publications
(36 citation statements)
references
References 23 publications
0
36
0
Order By: Relevance
“…In this work, we do not investigate learning strategies for repeated play, but only for the one-shot game in isolation. Also related is the multi-agent bandit problem [37], in which a team of agents has to agree on which arm to choose in a classical multi-armed bandit problem. The main difference with our setting is that we are not interested in considering the regret during learning, but only in learning good approximations as close as possible to the original action-value function.…”
Section: Definitionmentioning
confidence: 99%
“…In this work, we do not investigate learning strategies for repeated play, but only for the one-shot game in isolation. Also related is the multi-agent bandit problem [37], in which a team of agents has to agree on which arm to choose in a classical multi-armed bandit problem. The main difference with our setting is that we are not interested in considering the regret during learning, but only in learning good approximations as close as possible to the original action-value function.…”
Section: Definitionmentioning
confidence: 99%
“…Most papers on MAMAB consider a set of agents pulling the same set of arms simultaneously, and in most of them the agents coordinate through real-time communications to collaboratively find the optimal policy, with a few exceptions Bubeck et al, 2021). In the former, the communication resource is either costly (Madhushani and Leonard, 2020), limited by budget (Lalitha and Goldsmith, 2021;Vial et al, 2021;Sankararaman et al, 2019;Chawla et al, 2020), or constrained through communication networks (Landgren et al, 2021;Shahrampour et al, 2017), so the main focus is on designing communication efficient schemes that achieve the same performance as if there were no information asymmetry. In another related thread, referred to as the matching bandits problem, agents choosing the same arm collide and obtain zero rewards (Kalathil et al, 2014;; here and in a few other works (Shahrampour et al, 2017), different agents get different distribution of rewards from the same arm, while in other referenced work they get independent and identically distributed samples from the same arm.…”
Section: Appendix a Supplementary Detailsmentioning
confidence: 99%
“…Another line of solutions is to enforce coordination among agents, essentially transforming a multi-agent system back to (or making it more similar to) a single-agent one. One way to achieve this is through communication (Shahrampour et al, 2017;Zhang et al, 2021), which introduces extra costs that may be intolerable in some cases. A more broadly applicable scheme is through the common information (CI) approach (Nayyar et al, 2013;Chang et al, 2021;Dibangoye and Buffet, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Multi-Agent Multi-Armed Bandits. Existing decentralized cooperative MAB algorithms make one or more of the following assumptions: (i) agents independently interact with the same MAB (Lupu, Durand, and Precup 2019), (ii) they use sophisticated communication protocols to exchange information about rewards and the number of times actions were played (Landgren, Srivastava, and Leonard 2016;Martínez-Rubio, Kanade, and Rebeschini 2019;Sankararaman, Ganesh, and Shakkottai 2019;Shahrampour, Rakhlin, and Jadbabaie 2017), (iii) when sophisticated communication is not possible, agents share their latest action and reward (Madhushani and Leonard 2019). These assumptions are often required to simplify the analysis, but are not realistic in most human-AI interactions, e.g., collaborative transport, assembly, cooking, or autonomous driving, where (i) agents' actions influence the outcome for the whole team, (ii) they do not have explicit communication channels or might have different state, action representations that are difficult to communicate, or (iii) they have different capabilities, e.g., noisier sensors.…”
Section: Related Workmentioning
confidence: 99%