Value-Decomposition Multi-Agent Actor-Critics

Su, Jianyu; Adams, Stephen; Beling, Peter A.

doi:10.1609/aaai.v35i13.17353

Cited by 48 publications

(32 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, unlike QMIX, VDAC is compatible with A2C, which makes sampling more efficient. In addition, the study demonstrates that following a simple gradient calculated from a temporal-difference advantage, the policy can converge to a local optimal (Su et al 2021). Q-DPP (Yang et al 2020b) do not rely on constraints to decompose the global value function.…”

Section: Centralised Training and Decentralised Executionmentioning

confidence: 87%

Deep multiagent reinforcement learning: challenges and directions

et al. 2022

View full text Add to dashboard Cite

This paper surveys the field of deep multiagent reinforcement learning (RL). The combination of deep neural networks with RL has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on multiple players’ joint actions and (b) the computational complexity increases. We present the most common multiagent problem representations and their main challenges, and identify five research areas that address one or more of these challenges: centralised training and decentralised execution, opponent modelling, communication, efficient coordination, and reward shaping. We find that many computational studies rely on unrealistic assumptions or are not generalisable to other settings; they struggle to overcome the curse of dimensionality or nonstationarity. Approaches from psychology and sociology capture promising relevant behaviours, such as communication and coordination, to help agents achieve better performance in multiagent settings. We suggest that, for multiagent RL to be successful, future research should address these challenges with an interdisciplinary approach to open up new possibilities in multiagent RL.

show abstract

Section: Centralised Training and Decentralised Executionmentioning

confidence: 87%

Deep multiagent reinforcement learning: challenges and directions

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Also, there exists a series of value-function-factorization-based methods that train decentralized policies in a centralized end-to-end fashion, employing a joint action value network [27]. There are many follow-ups on actor-critic-based MARL algorithms, addressing a variety of issues, namely SAC [26] for improving scalability, SEAC [4] for sharing experience, LICA [44] for the credit assignment problem, TESSER-ACT [21] for tensorizing the critics, Bilevel Actor Critic [42] for multi-agent coordination problem with unequal agents, DAC-TD [6] for training agents in a privacy-aware framework, VDAC [31] for combining value-decomposition framework with actor-critic, Scalable Actor Critic [17] for scalable learning in stochastic network of agents, etc. However, none of these works consider the role emergence paradigm in their model.…”

Section: Related Literaturementioning

confidence: 99%

Opponent-aware Role-based Learning in Team Competitive Markov Games

Koley¹,

Maiti²,

Ganguly³

et al. 2023

Preprint

View full text Add to dashboard Cite

Team competition in multi-agent Markov games is an increasingly important setting for multi-agent reinforcement learning, due to it's general applicability in modeling many real-life situations. Multiagent actor-critic methods are the most suitable class of techniques for learning optimal policies in the team competition setting, due to their flexibility in learning agent-specific critic functions, which can also learn from other agents. In many real-world team competitive scenarios, the roles of the agents naturally emerge, in order to aid in coordination and collaboration within members of the teams. However, existing methods for learning emergent roles rely heavily on the Q-learning setup which does not allow learning of agent-specific Q-functions. In this paper, we propose RAC, a novel technique for learning the emergent roles of agents within a team that are diverse and dynamic. In the proposed method, agents also benefit from predicting the roles of the agents in the opponent team. RAC uses the actor-critic framework with role encoder and opponent role predictors for learning an optimal policy. Experimentation using 2 games demonstrates that the policies learned by RAC achieve higher rewards than those learned using state-of-the-art baselines. Moreover, experiments suggest that the agents in a team learn diverse and opponent-aware policies.

show abstract

“…In MARL, each pond could be controlled by an individual agent tuned to that pond's specific goals, while also operating cooperatively towards system-level goals. 54,55 In MORL, sets of policies are learned to approximate a Pareto frontier; 56 this is especially valuable for comparing trade-offs among agents. Similar multi-objective optimization is well studied for reservoir operation and could provide an alternative to MORL.…”

Section: Towards System-level Controlmentioning

confidence: 99%