Adaptive policy gradient in multiagent learning

Banerjee, Bikramjit; Peng, Jing

doi:10.1145/860575.860686

Cited by 34 publications

(12 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The rationale is that the agent should escape fast from losing situations, while adapting cautiously when it is winning, in order to encourage convergence. The win/lose criterion in (25) is based either on a comparison of an average policy with the current one, in the original version of WoLF-PHC, or on the second-order difference of policy elements, in PD-WoLF [74].…”

Section: Instantiations Of Correlated Equilibrium Q-learning (Ce-q)mentioning

confidence: 99%

A Comprehensive Survey of Multiagent Reinforcement Learning

Buşoniu

Babuška

Schutter

2008

IEEE Trans. Syst., Man, Cybern. C

1,669

947

View full text Add to dashboard Cite

Abstract-Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim-either explicitly or implicitly-at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.

show abstract

Section: Instantiations Of Correlated Equilibrium Q-learning (Ce-q)mentioning

confidence: 99%

A Comprehensive Survey of Multiagent Reinforcement Learning

Buşoniu

Babuška

Schutter

2008

IEEE Trans. Syst., Man, Cybern. C

1,669

947

View full text Add to dashboard Cite

show abstract

“…They are converging to the Nash equilibrium at the centre, but more slowly as time goes on as the distance between points is decreasing. ing) [9], (Policy Dynamics based WoLF) PDWoLF-PHC [7], and (Generalised IGA) GIGA-WoLF [8].…”

Section: Reinforcement Learningmentioning

confidence: 99%

“…A population of simultaneously co-adapting or coevolving agents in an uncooperative setting may converge or exhibit complex dynamics [37,12,13,14,38,9,7,8,2,11,44,6,23]. The goal of this work is to address the question of whether convergence is enhanced if each agent assumes that the other agents are changing their strategies over time.…”

Section: Introductionmentioning

confidence: 99%

Convergence of Strategies in Simple Co-Adapting Games

Mealing

Shapiro

2015

Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII

View full text Add to dashboard Cite

Simultaneously co-adapting agents in an uncooperative setting can result in a non-stationary environment where optimisation or learning is difficult and where the agents' strategies may not converge to solutions. This work looks at simple simultaneous-move games with two or three actions and two or three players. Fictitious play is an old but popular algorithm that can converge to solutions, albeit slowly, in selfplay in games like these. It models its opponents assuming that they use stationary strategies and plays a best-response strategy to these models. We propose two new variants of fictitious play that remove this assumption and explicitly assume that the opponents use dynamic strategies. The opponent's strategy is predicted using a sequence prediction method in the first variant and a change detection method in the second variant. Empirical results show that our variants converge faster than fictitious play. However, they do not always converge exactly to correct solutions. For change detection, this is a very small number of cases, but for sequence prediction there are many. The convergence of sequence prediction is improved by combining it with fictitious play. Also, unlike in fictitious play, our variants converge to solutions in the difficult Shapley's and Jordan's games.

show abstract

“…The win criterion is based either on a comparison of an average policy with the current one, in the original version of WoLF-PHC, or on the second-order difference of policy elements, in PD-WoLF [34]. The rationale is that the agent should escape fast from losing situations, while adapting cautiously when it is winning, in order to encourage convergence.…”

Section: ) Repeated Gamesmentioning

confidence: 99%

Multi-Agent Reinforcement Learning: A Survey

Buşoniu

Babuška

Schutter

2006

2006 9th International Conference on Control, Automation, Robotics and Vision

View full text Add to dashboard Cite

Abstract-Multi-agent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, economics. Many tasks arising in these domains require that the agents learn behaviors online. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. However, due to different viewpoints on central issues, such as the formal statement of the learning goal, a large number of different methods and approaches have been introduced. In this paper we aim to present an integrated survey of the field. First, the issue of the multi-agent learning goal is discussed, after which a representative selection of algorithms is reviewed. Finally, open issues are identified and future research directions are outlined.

show abstract

Adaptive policy gradient in multiagent learning

Cited by 34 publications

References 8 publications

A Comprehensive Survey of Multiagent Reinforcement Learning

A Comprehensive Survey of Multiagent Reinforcement Learning

Convergence of Strategies in Simple Co-Adapting Games

Multi-Agent Reinforcement Learning: A Survey

Contact Info

Product

Resources

About