2022
DOI: 10.1007/s00521-022-07960-5
|View full text |Cite
|
Sign up to set email alerts
|

Difference rewards policy gradients

Abstract: Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 24 publications
(58 reference statements)
0
1
0
Order By: Relevance
“…The general solution to this problem is reward shaping, with difference rewards and potential-based reward shaping as the two main classes. Difference rewards consider both the individual and the global reward (Foerster et al 2018b;Proper and Tumer 2012;Nguyen et al 2018;Castellini et al 2021) and help an agent understand its impact on the environment by removing the noise created by other acting agents. Specifically, it is defined as D i (z) = G(z) − G(z − z i ) where D i is the difference reward of agent i, G(z) is the global reward considering the joint state-action z, and G(z − z i ) is a modified version of the state-action vector z in which agent i takes a default action, or more intuitively, the global reward without the contribution of agent i (Yliniemi and Tumer 2014).…”
Section: Reward Shapingmentioning
confidence: 99%
“…The general solution to this problem is reward shaping, with difference rewards and potential-based reward shaping as the two main classes. Difference rewards consider both the individual and the global reward (Foerster et al 2018b;Proper and Tumer 2012;Nguyen et al 2018;Castellini et al 2021) and help an agent understand its impact on the environment by removing the noise created by other acting agents. Specifically, it is defined as D i (z) = G(z) − G(z − z i ) where D i is the difference reward of agent i, G(z) is the global reward considering the joint state-action z, and G(z − z i ) is a modified version of the state-action vector z in which agent i takes a default action, or more intuitively, the global reward without the contribution of agent i (Yliniemi and Tumer 2014).…”
Section: Reward Shapingmentioning
confidence: 99%