Difference rewards policy gradients

Castellini, Jacopo; Devlin, Sam; Oliehoek, Frans A.; Savani, Rahul

doi:10.1007/s00521-022-07960-5

Cited by 2 publications

(1 citation statement)

References 24 publications

(58 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The general solution to this problem is reward shaping, with difference rewards and potential-based reward shaping as the two main classes. Difference rewards consider both the individual and the global reward (Foerster et al 2018b;Proper and Tumer 2012;Nguyen et al 2018;Castellini et al 2021) and help an agent understand its impact on the environment by removing the noise created by other acting agents. Specifically, it is defined as D i (z) = G(z) − G(z − z i ) where D i is the difference reward of agent i, G(z) is the global reward considering the joint state-action z, and G(z − z i ) is a modified version of the state-action vector z in which agent i takes a default action, or more intuitively, the global reward without the contribution of agent i (Yliniemi and Tumer 2014).…”

Section: Reward Shapingmentioning

confidence: 99%

Deep multiagent reinforcement learning: challenges and directions

et al. 2022

View full text Add to dashboard Cite

This paper surveys the field of deep multiagent reinforcement learning (RL). The combination of deep neural networks with RL has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on multiple players’ joint actions and (b) the computational complexity increases. We present the most common multiagent problem representations and their main challenges, and identify five research areas that address one or more of these challenges: centralised training and decentralised execution, opponent modelling, communication, efficient coordination, and reward shaping. We find that many computational studies rely on unrealistic assumptions or are not generalisable to other settings; they struggle to overcome the curse of dimensionality or nonstationarity. Approaches from psychology and sociology capture promising relevant behaviours, such as communication and coordination, to help agents achieve better performance in multiagent settings. We suggest that, for multiagent RL to be successful, future research should address these challenges with an interdisciplinary approach to open up new possibilities in multiagent RL.

show abstract