2018
DOI: 10.1007/978-3-319-75931-9_5
|View full text |Cite
|
Sign up to set email alerts
|

Neural Fictitious Self-Play in Imperfect Information Games with Many Players

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 5 publications
0
6
0
Order By: Relevance
“…For computational efficiency, has also proposed a data-drive fictitious self-play framework where the best-response is computed via fitted Q-iteration (Ernst et al, 2005;Munos, 2007) for the single-agent RL problem, with the policy mixture being learned through supervised learning. This framework was later adopted by Silver (2014, 2016); Kawamura et al (2017); to incorporate other single RL methods such as deep Q-network (Mnih et al, 2015) and Monte-Carlo tree search (Coulom, 2006;Kocsis and Szepesvári, 2006;Browne et al, 2012). Moreover, in a more recent work, Perolat et al (2018) has proposed a smooth fictitious play algorithm (Fudenberg and Levine, 1995) for zero-sum stochastic games with simultaneous moves.…”
Section: Policy-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For computational efficiency, has also proposed a data-drive fictitious self-play framework where the best-response is computed via fitted Q-iteration (Ernst et al, 2005;Munos, 2007) for the single-agent RL problem, with the policy mixture being learned through supervised learning. This framework was later adopted by Silver (2014, 2016); Kawamura et al (2017); to incorporate other single RL methods such as deep Q-network (Mnih et al, 2015) and Monte-Carlo tree search (Coulom, 2006;Kocsis and Szepesvári, 2006;Browne et al, 2012). Moreover, in a more recent work, Perolat et al (2018) has proposed a smooth fictitious play algorithm (Fudenberg and Levine, 1995) for zero-sum stochastic games with simultaneous moves.…”
Section: Policy-based Methodsmentioning
confidence: 99%
“…Recently, this domain has gained resurgence of interest due to the advances of single-agent RL techniques. Indeed, a huge volume of work on MARL has appeared lately, focusing on either identifying new learning criteria and/or setups (Foerster et al, 2016;Zazo et al, 2016;Subramanian and Mahajan, 2019), or developing new algorithms for existing setups, thanks to the development of deep learn-ing (Heinrich and Silver, 2016;Lowe et al, 2017;Gupta et al, 2017;Omidshafiei et al, 2017;Kawamura et al, 2017;, operations research (Mazumdar and Ratliff, 2018;Jin et al, 2019;Sidford et al, 2019), and multi-agent systems (Oliehoek and Amato, 2016;Arslan and Y üksel, 2017;Yongacoglu et al, 2019;. Nevertheless, not all the efforts are placed under rigorous theoretical footings, partly due to the limited understanding of even single-agent deep RL theories, and partly due to the inherent challenges in multi-agent settings.…”
mentioning
confidence: 99%
“…In Leduc Poker, a simplification of the former, they approached a NE. Kawamura et al [26] calculated approximate NE strategies with NFSP in multiplayer IIGs.…”
Section: Neural Fictitious Self-playmentioning
confidence: 99%
“…Researchers have proposed variants of algorithms to generalize NFSP, and these variants have been applied on several imperfect information domains such as doudizhu, 10 multiplayer Kuhn poker, 11 security game, 12 and autonomous vehicle control 13 …”
Section: Introductionmentioning
confidence: 99%
“…Researchers have proposed variants of algorithms to generalize NFSP, and these variants have been applied on several imperfect information domains such as doudizhu, 10 multiplayer Kuhn poker, 11 security game, 12 and autonomous vehicle control. 13 Although these algorithms have succeeded in practice, the crucial limitation is that they need long-term iteration to converge.…”
Section: Introductionmentioning
confidence: 99%