Multi-Agent Interactions Modeling with Correlated Policies

Liu, Minghuan; Zhou, Ming; Zhang, Weinan; Zhuang, Yuzheng; Wang, Jun; Liu, Wulong; Yu, Yong

doi:10.48550/arxiv.2001.03415

Cited by 4 publications

(4 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While previous works have delved into correlated policies through various methodologies, such as explicit modeling and recursive reasoning frameworks, our approach diverges by prioritizing the maximization of MI between actions of multiple agents. This emphasis on MI serves as a comprehensive measure of correlation, aiming to foster effective coordination among agents in MARL settings [19,20]. A promising direction is to leverage principles from information theory to design coordination strategies.…”

Section: Related Workmentioning

confidence: 99%

Towards Efficient Coordination in Multi-Agent Reinforcement Learning through Hybrid Information-Driven Approaches

Chauhan

2024

IJRASET

View full text Add to dashboard Cite

This paper introduces a novel framework to enhance coordination in multi-agent reinforcement learning (MARL) systems by integrating mutual information (MMI) principles with information-driven strategies. Firstly, we propose a variational approach leveraging MMI to promote coordinated behaviors among agents by regulating the cumulative return alongside the simultaneous mutual information between multi-agent actions. Through the introduction of a latent variable inducing nonzero mutual information and the application of a variational bound, a tractable lower bound is derived for the MMI-regularized objective function. This bound combines maximum entropy reinforcement learning with reducing uncertainty in other agents' actions. Subsequently, we present a practical algorithm, Variational Maximum Mutual Information Multi-Agent Actor-Critic (VM3-AC), utilizing policy iteration to maximize the derived lower bound, following a centralized learning with decentralized execution (CTDE) paradigm. Secondly, we explore the challenges of large state spaces and limited computational resources in distributed multi-agent systems, proposing a hybrid information-driven MARL approach. This approach integrates informationtheoretic models as heuristics to aid navigation in sparse state spaces, complemented by information-based rewards within an RL framework to learn higher-level policies efficiently. Our preliminary findings suggest that this hybrid approach could enhance exploration efficiency significantly, demonstrating approximately three orders of magnitude improvement over naive baseline metrics. Although still in its early stages, this work presents a promising direction for future research in achieving efficient coordination in MARL systems

show abstract

Section: Related Workmentioning

confidence: 99%

Towards Efficient Coordination in Multi-Agent Reinforcement Learning through Hybrid Information-Driven Approaches

Chauhan

2024

IJRASET

View full text Add to dashboard Cite

show abstract

“…The standard maximum entropy MARL method learns the joint stochastic policy 𝝅 (𝒖 𝑡 | 𝒐 𝑡 ) while the individual policy 𝜋 𝑖 (𝑢 𝑖 | 𝑜 𝑖 ) is unavailable, which violates the CTDE framework. To deal with the issue, we attempt to use the multivariate Gaussian distribution N 𝑀 to model the interaction between individual policies because the behavioral strategies reflect agents' cooperation relationship [18]. Let 𝑑 denote the dimension of the action space for each agent and Σ denote the covariance matrix of the multivariate Gaussian distribution.…”

Section: Collaborative Exploration Modulementioning

confidence: 99%

Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning

Ma,

Yang,

et al. 2021

Preprint

View full text Add to dashboard Cite

Value-based methods of multi-agent reinforcement learning (MARL), especially the value decomposition methods, have been demonstrated on a range of challenging cooperative tasks. However, current methods pay little attention to the interaction between agents, which is essential to teamwork in games or real life. This limits the efficiency of value-based MARL algorithms in the two aspects: collaborative exploration and value function estimation. In this paper, we propose a novel cooperative MARL algorithm named as interactive actor-critic (IAC), which models the interaction of agents from the perspectives of policy and value function. On the policy side, a multi-agent joint stochastic policy is introduced by adopting a collaborative exploration module, which is trained by maximizing the entropy-regularized expected return. On the value side, we use the shared attention mechanism to estimate the value function of each agent, which takes the impact of the teammates into consideration. At the implementation level, we extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments. Experimental results indicate that our method outperforms the state-of-the-art approaches and achieves better performance in terms of cooperation.

show abstract

“…The correlated policies are considered in several other works too. [15] proposed the explicit modeling of correlated policies for multi-agent imitation learning, and [25] proposed a probabilistic recursive reasoning framework. By introducing a latent variable and variational lower bound on mutual information, the proposed VM3-AC increases the correlation among policies without communication in the execution phase and without explicit dependency across agents' actions.…”

Section: Appendix A: Related Workmentioning

confidence: 99%

“…Such non-correlated factorization of the joint policy limits the agents to learn coordinated behavior due to negligence of the influence of other agents [25,2]. However, learning coordinated behavior is one of the fundamental problems in MARL [25,15].…”

Section: Introductionmentioning

confidence: 99%

A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

Kim¹,

Jung²,

Cho³

et al. 2020

Preprint

View full text Add to dashboard Cite

In this paper, we propose a maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the mutual information between actions. By introducing a latent variable to induce nonzero mutual information between actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms MADDPG and other MARL algorithms in multi-agent tasks requiring coordination.Preprint. Under review.

show abstract

Multi-Agent Interactions Modeling with Correlated Policies

Cited by 4 publications

References 23 publications

Towards Efficient Coordination in Multi-Agent Reinforcement Learning through Hybrid Information-Driven Approaches

Towards Efficient Coordination in Multi-Agent Reinforcement Learning through Hybrid Information-Driven Approaches

Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning

A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

Contact Info

Product

Resources

About