Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems 2003
DOI: 10.1145/860575.860686
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive policy gradient in multiagent learning

Abstract: Inspired by the recent results in policy gradient learning in a generalsum game scenario, in the form of two algorithms, IGA and WoLF-IGA, we explore an alternative version of WoLF. We show that our new WoLF criterion (PDWoLF) is also accurate in 2 ¢ 2 games, while being accurately computable even in more than 2-action games, unlike WoLF that relies on estimation. In particular, we show that this difference in accuracy in more than 2-action games translates to faster convergence (to Nash equilibrium policies i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2004
2004
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(12 citation statements)
references
References 8 publications
0
12
0
Order By: Relevance
“…The rationale is that the agent should escape fast from losing situations, while adapting cautiously when it is winning, in order to encourage convergence. The win/lose criterion in (25) is based either on a comparison of an average policy with the current one, in the original version of WoLF-PHC, or on the second-order difference of policy elements, in PD-WoLF [74].…”
Section: Instantiations Of Correlated Equilibrium Q-learning (Ce-q)mentioning
confidence: 99%
“…The rationale is that the agent should escape fast from losing situations, while adapting cautiously when it is winning, in order to encourage convergence. The win/lose criterion in (25) is based either on a comparison of an average policy with the current one, in the original version of WoLF-PHC, or on the second-order difference of policy elements, in PD-WoLF [74].…”
Section: Instantiations Of Correlated Equilibrium Q-learning (Ce-q)mentioning
confidence: 99%
“…They are converging to the Nash equilibrium at the centre, but more slowly as time goes on as the distance between points is decreasing. ing) [9], (Policy Dynamics based WoLF) PDWoLF-PHC [7], and (Generalised IGA) GIGA-WoLF [8].…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…A population of simultaneously co-adapting or coevolving agents in an uncooperative setting may converge or exhibit complex dynamics [37,12,13,14,38,9,7,8,2,11,44,6,23]. The goal of this work is to address the question of whether convergence is enhanced if each agent assumes that the other agents are changing their strategies over time.…”
Section: Introductionmentioning
confidence: 99%
“…The win criterion is based either on a comparison of an average policy with the current one, in the original version of WoLF-PHC, or on the second-order difference of policy elements, in PD-WoLF [34]. The rationale is that the agent should escape fast from losing situations, while adapting cautiously when it is winning, in order to encourage convergence.…”
Section: ) Repeated Gamesmentioning
confidence: 99%