2020
DOI: 10.48550/arxiv.2006.10410
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DREAM: Deep Regret minimization with Advantage baselines and Model-free learning

Abstract: We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash Equilibrium in two-player zero-sum games and to an extensive-form coarse correlated equilibrium in all other games. Our primary innovation is an effective algorithm that, in contrast to other regret-based deep learning algorithms, does not require access to a perfect simulator of the game to achieve good performance. We show that DREA… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 21 publications
(47 reference statements)
0
4
0
Order By: Relevance
“…There have been a number of RL algorithms that have been proposed for two-player zero-sum games: Fictitious Self-Play (47), Policy-Space Response Oracles (PSRO) (48), Double Neural CFR (49), Deep CFR and DREAM (50,51), Regret Policy Gradients (52), Exploitability Descent (53), Neural Replicator Dynamics (NeuRD) (54), Advantage Regret-Matching Actor Critic (55), Friction FoReL (56), Extensive-form Double Oracle (XDO) (57), Neural Auto-curricula (NAC) (58), and Regularized Nash Dynamics (R-NaD) (59). These methods adapt classical algorithms for computing (approximate) Nash equilibria to the RL setting with sampled experience and general function approximation.…”
Section: Related Workmentioning
confidence: 99%
“…There have been a number of RL algorithms that have been proposed for two-player zero-sum games: Fictitious Self-Play (47), Policy-Space Response Oracles (PSRO) (48), Double Neural CFR (49), Deep CFR and DREAM (50,51), Regret Policy Gradients (52), Exploitability Descent (53), Neural Replicator Dynamics (NeuRD) (54), Advantage Regret-Matching Actor Critic (55), Friction FoReL (56), Extensive-form Double Oracle (XDO) (57), Neural Auto-curricula (NAC) (58), and Regularized Nash Dynamics (R-NaD) (59). These methods adapt classical algorithms for computing (approximate) Nash equilibria to the RL setting with sampled experience and general function approximation.…”
Section: Related Workmentioning
confidence: 99%
“…It is time-consuming for CFR to traverse the full game tree in solving large-scale imperfect-information extensive-form games. Therefore, to solve large-scale imperfect-information extensiveform games, some sampling-based CFR algorithms are proposed, such as external sampling, outcome sampling [8] , probe sampling [9] and other reduce variance sampling algorithms [22,23] . These sampling-based CFR algorithms only traverse a subset of the full game tree.…”
Section: U σ I (I) I I Imentioning
confidence: 99%
“…In this section, we use the same notation as in DREAM (Steinberger et al, 2020). An extensive-form game progresses through a sequence of player actions, and has a world state w P W at each step.…”
Section: Extensive-form Gamesmentioning
confidence: 99%
“…However, Deep CFR uses external sampling, which may be impractical for games with a large branching factor such as Stratego and Barrage Stratego. DREAM (Steinberger et al, 2020) and ARMAC (Gruslys et al, 2020) are model-free regret-based deep learning approaches.…”
Section: Related Workmentioning
confidence: 99%