Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

Kao, Hsu; Wei, Chen-Yu; Subramanian, Vijay

doi:10.48550/arxiv.2111.00781

Cited by 1 publication

(1 citation statement)

References 21 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another long line of work features collision models where rewards are lower if multiple agents simultaneously pull the same arm (e.g., [1,5,13,21,30,41,42,45,50]), unlike our model. Along these lines, other reward structures have been studied, such as reward being a function of the agents' joint action (e.g., [8,9,32]).…”

Section: Other Related Workmentioning

confidence: 99%

Robust Multi-Agent Bandits Over Undirected Graphs

Vial¹,

Shakkottai²,

Srikant³

2022

Preprint

View full text Add to dashboard Cite

We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) log(T )/∆) regret in this setting, where K is the number of arms and ∆ is the arm gap. For m K, this improves over the single-agent baseline regret of O(K log(T )/∆).In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O((d mal (i) + K/n) log(T )/∆) on any connected and undirected graph, where d mal (i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where d mal (i) = m), and show the effect of malicious agents is entirely local (in the sense that only the d mal (i) malicious agents directly connected to i affect its long-term regret).

show abstract