Multi-Agent Actor-Critic with Hierarchical Graph Attention Network

Ryu, Han-Youl; Shin, Hayong; Park, Jinkyoo

doi:10.1609/aaai.v34i05.6214

Cited by 88 publications

(32 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As such, we can regard the opponent sample complexity as communication load. To lower the opponent sample complexity, we replace the real opponents with learned opponent models in data simulation, which is analogous to selectively call some (or none) of opponents for useful information to reduce the communication load (or bandwidth) in multi-agent interactions [Ryu et al, 2020].…”

Section: Two Parts Of Sample Complexitymentioning

confidence: 99%

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Zhang

Wang

Shen

et al. 2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic performance over the compared MARL methods.

show abstract

Section: Two Parts Of Sample Complexitymentioning

confidence: 99%

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Zhang

Wang

Shen

et al. 2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…To address the limitation, the agent grouping method [12] employs a two-level graph neural network to model the interagent and intergroup relationships effectively. However, it ignores the communication relationship between the agents in the same group.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, deep reinforcement learning (DRL) has shown great potential in many domains, such as games [5,6] and robotics [7,8]. Inspired by the powerful perception and learning ability of DRL, researchers have made continuous attempts to apply DRL to multiagent reinforcement learning (MARL) to promote multiagent cooperative behaviors in environments with many agents [9][10][11][12][13][14][15]. Based on the common paradigm of centralized learning with decentralized execution, some MARL algorithms learn centralized critics for multiple agents and determine the decentralized action.…”

Section: Introductionmentioning

confidence: 99%

Multiagent Hierarchical Cognition Difference Policy for Multiagent Cooperation

Wang

Liu

et al. 2021

Algorithms

View full text Add to dashboard Cite

Multiagent cooperation is one of the most attractive research fields in multiagent systems. There are many attempts made by researchers in this field to promote cooperation behavior. However, several issues still exist, such as complex interactions among different groups of agents, redundant communication contents of irrelevant agents, which prevents the learning and convergence of agent cooperation behaviors. To address the limitations above, a novel method called multiagent hierarchical cognition difference policy (MA-HCDP) is proposed in this paper. It includes a hierarchical group network (HGN), a cognition difference network (CDN), and a soft communication network (SCN). HGN is designed to distinguish different underlying information of diverse groups’ observations (including friendly group, enemy group, and object group) and extract different high-dimensional state representations of different groups. CDN is designed based on a variational auto-encoder to allow each agent to choose its neighbors (communication targets) adaptively with its environment cognition difference. SCN is designed to handle the complex interactions among the agents with a soft attention mechanism. The results of simulations demonstrate the superior effectiveness of our method compared with existing methods.

show abstract

“…In RL and MARL, various forms of inductive bias have been used to improve learning. The most straightforward inductive biases entail designing network structures for the critic or policy, such as attention networks [5], graph neural networks [15], and implicit communication structures [14]. However, biases in game information, such as state, reward, and action have also been used in an attempt to boost training.…”

Section: Biases In Rl and Marlmentioning

confidence: 99%

Cooperative and Competitive Biases for Multi-Agent Reinforcement Learning

Ryu¹,

Shin²,

Park³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Training a multi-agent reinforcement learning (MARL) algorithm is more challenging than training a single-agent reinforcement learning algorithm, because the result of a multi-agent task strongly depends on the complex interactions among agents and their interactions with a stochastic and dynamic environment. We propose an algorithm that boosts MARL training using the biased action information of other agents based on a friend-or-foe concept. For a cooperative and competitive environment, there are generally two groups of agents: cooperative-agents and competitive-agents. In the proposed algorithm, each agent updates its value function using its own action and the biased action information of other agents in the two groups. The biased joint action of cooperative agents is computed as the sum of their actual joint action and the imaginary cooperative joint action, by assuming all the cooperative agents jointly maximize the target agent's value function. The biased joint action of competitive agents can be computed similarly. Each agent then updates its own value function using the biased action information, resulting in a biased value function and corresponding biased policy. Subsequently, the biased policy of each agent is inevitably subjected to recommend an action to cooperate and compete with other agents, thereby introducing more active interactions among agents and enhancing the MARL policy learning. We empirically demonstrate that our algorithm outperforms existing algorithms in various mixed cooperative-competitive environments. Furthermore, the introduced biases gradually decrease as the training proceeds and the correction based on the imaginary assumption vanishes.

show abstract

Multi-Agent Actor-Critic with Hierarchical Graph Attention Network

Cited by 88 publications

References 19 publications

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Multiagent Hierarchical Cognition Difference Policy for Multiagent Cooperation

Cooperative and Competitive Biases for Multi-Agent Reinforcement Learning

Contact Info

Product

Resources

About