Neural Fictitious Self-Play in Imperfect Information Games with Many Players

Kawamura, Keigo; Mizukami, Naoki; Tsuruoka, Yoshimasa

doi:10.1007/978-3-319-75931-9_5

Cited by 8 publications

(6 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For computational efficiency, has also proposed a data-drive fictitious self-play framework where the best-response is computed via fitted Q-iteration (Ernst et al, 2005;Munos, 2007) for the single-agent RL problem, with the policy mixture being learned through supervised learning. This framework was later adopted by Silver (2014, 2016); Kawamura et al (2017); to incorporate other single RL methods such as deep Q-network (Mnih et al, 2015) and Monte-Carlo tree search (Coulom, 2006;Kocsis and Szepesvári, 2006;Browne et al, 2012). Moreover, in a more recent work, Perolat et al (2018) has proposed a smooth fictitious play algorithm (Fudenberg and Levine, 1995) for zero-sum stochastic games with simultaneous moves.…”

Section: Policy-based Methodsmentioning

confidence: 99%

“…Recently, this domain has gained resurgence of interest due to the advances of single-agent RL techniques. Indeed, a huge volume of work on MARL has appeared lately, focusing on either identifying new learning criteria and/or setups (Foerster et al, 2016;Zazo et al, 2016;Subramanian and Mahajan, 2019), or developing new algorithms for existing setups, thanks to the development of deep learn-ing (Heinrich and Silver, 2016;Lowe et al, 2017;Gupta et al, 2017;Omidshafiei et al, 2017;Kawamura et al, 2017;, operations research (Mazumdar and Ratliff, 2018;Jin et al, 2019;Sidford et al, 2019), and multi-agent systems (Oliehoek and Amato, 2016;Arslan and Y üksel, 2017;Yongacoglu et al, 2019;. Nevertheless, not all the efforts are placed under rigorous theoretical footings, partly due to the limited understanding of even single-agent deep RL theories, and partly due to the inherent challenges in multi-agent settings.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Zhang¹,

Yang²,

Başar³

2019

Preprint

154

View full text Add to dashboard Cite

Recent years have witnessed significant advances in reinforcement learning (RL), which has registered great success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.Recent years have witnessed sensational advances of reinforcement learning (RL) in many prominent sequential decision-making problems, such as playing the game of Go (Silver

show abstract

Section: Policy-based Methodsmentioning

confidence: 99%

mentioning

confidence: 99%

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Zhang¹,

Yang²,

Başar³

2019

Preprint

154

View full text Add to dashboard Cite

show abstract

“…In Leduc Poker, a simplification of the former, they approached a NE. Kawamura et al [26] calculated approximate NE strategies with NFSP in multiplayer IIGs.…”

Section: Neural Fictitious Self-playmentioning

confidence: 99%

Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass

Niklaus

Alberti

Pondenkandath

et al. 2019

2019 6th Swiss Conference on Data Science (SDS)

View full text Add to dashboard Cite

In the last decades we have witnessed the success of applications of Artificial Intelligence to playing games. In this work we address the challenging field of games with hidden information and card games in particular. Jass is a very popular card game in Switzerland and is closely connected with Swiss culture. To the best of our knowledge, performances of Artificial Intelligence agents in the game of Jass do not outperform top players yet. Our contribution to the community is two-fold. First, we provide an overview of the current state-of-the-art of Artificial Intelligence methods for card games in general. Second, we discuss their application to the use-case of the Swiss card game Jass. This paper aims to be an entry point for both seasoned researchers and new practitioners who want to join in the Jass challenge.

show abstract

Section: Introductionmentioning

confidence: 99%

“…Researchers have proposed variants of algorithms to generalize NFSP, and these variants have been applied on several imperfect information domains such as doudizhu, 10 multiplayer Kuhn poker, 11 security game, 12 and autonomous vehicle control. 13 Although these algorithms have succeeded in practice, the crucial limitation is that they need long-term iteration to converge.…”

Section: Introductionmentioning

confidence: 99%

Finding nash equilibrium for imperfect information games via fictitious play based on local regret minimization

Wang

et al. 2022

Int J of Intelligent Sys

View full text Add to dashboard Cite

Finding Nash equilibrium in the domain of imperfect information games as a challenging problem has received much attention. Neural Fictitious Self‐Play (NFSP) is a popular model‐free machine learning algorithm and has computed approximate Nash equilibrium on such games. However, the deep reinforcement learning method used to approximate the best response in NFSP requires reaching a fully observable Markov state, while the states in imperfect information games are partially observable and non‐Markovian, which results in a poor approximation of the best response. Thus, NFSP needs more iterations to converge. In this study, we present a new reinforcement learning method that is inspired by counterfactual regret minimization to relax the Markov requirement by iteratively updating policy according to the regret matching process. Combining this new reinforcement learning algorithm with fictitious play, we further present a novel algorithm to find approximate Nash equilibrium in zero‐sum imperfect information games. Experimental results in three benchmark games show that this new algorithm can find approximate Nash equilibrium effectively and converge much faster compared with baseline.

show abstract

Neural Fictitious Self-Play in Imperfect Information Games with Many Players

Cited by 8 publications

References 5 publications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass

Finding nash equilibrium for imperfect information games via fictitious play based on local regret minimization

Contact Info

Product

Resources

About