Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

Mao, Weichao; Zhang, Kaiqing; Miehling, Erik; Başar, Tamer

doi:10.1109/cdc42340.2020.9303801

Cited by 22 publications

(17 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As an example, Omidshafiei et al (2017) propose a decentralized MARL algorithm that uses RNNs to improve the agents' observability. Mao et al (2020) use an RNN to first compress the agents' histories into embeddings that are posteriorly fed into deep Q-networks, helping to improve agents' observability. The commonly used paradigm of centralized training with decentralized execution also contributes to alleviating partial observability at train time (Oliehoek et al, 2011;Rashid et al, 2018;Foerster et al, 2016;.…”

Section: A1 Partial Observability In Marlmentioning

confidence: 99%

Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

Santos¹,

Carvalho²,

Vasco³

et al. 2022

Preprint

View full text Add to dashboard Cite

We introduce hybrid execution in multi-agent reinforcement learning (MARL), a new paradigm in which agents aim to successfully perform cooperative tasks with any communication level at execution time by taking advantage of informationsharing among the agents. Under hybrid execution, the communication level can range from a setting in which no communication is allowed between agents (fully decentralized), to a setting featuring full communication (fully centralized). To formalize our setting, we define a new class of multi-agent partially observable Markov decision processes (POMDPs) that we name hybrid-POMDPs, which explicitly models a communication process between the agents. We contribute MARO, an approach that combines an autoregressive predictive model to estimate missing agents' observations, and a dropout-based RL training scheme that simulates different communication levels during the centralized training phase. We evaluate MARO on standard scenarios and extensions of previous benchmarks tailored to emphasize the negative impact of partial observability in MARL. Experimental results show that our method consistently outperforms baselines, allowing agents to act with faulty communication while successfully exploiting shared information.

show abstract

Section: A1 Partial Observability In Marlmentioning

confidence: 99%

Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

Santos¹,

Carvalho²,

Vasco³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…[78] designs a neural network architecture, IPOMDPnet, which extends QMDP-net planning algorithm [79] to MARL settings under POMDP. Besides, [80] intro-duces the concept of information state embedding to compress agents' histories and proposes an RNN model combining the state embedding. Their method, i.e., embed-then-learn pipeline, is universal since the embedding can be fed into any existing partially observable MARL algorithm as the black-box.…”

Section: Vertical Federated Reinforcement Learningmentioning

confidence: 99%

Federated reinforcement learning: techniques, applications, and open challenges

Qi¹,

Zhou²,

Lei³

et al. 2021

View full text Add to dashboard Cite

This paper presents a comprehensive survey of Federated Reinforcement Learning (FRL), an emerging and promising field in Reinforcement Learning (RL). Starting with a tutorial of Federated Learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, i.e., Horizontal Federated Reinforcement Learning (HFRL) and Vertical Federated Reinforcement Learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL.

show abstract

“…The quantization is done through the approximations as measured by Kullback-Leibler divergence (relative entropy) between probability density functions. Further recent studies include ( [75]) and ( [101]).…”

Section: Literature Reviewmentioning

confidence: 99%

“…( [101]) presents a notion of approximate information variable and studies near optimality of policies that satisfies the approximate information state property. In ( [75]), a similar problem is analyzed under a decentralized setup. Our explicit approximation results in this chapter will find applications in both of these studies.…”

Section: Literature Reviewmentioning

confidence: 99%