No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Celli, Andrea; Marchesi, Alberto; Farina, Gabriele; Gatti, Nicola

doi:10.48550/arxiv.2004.00603

Cited by 3 publications

(7 citation statements)

References 7 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…present the first line of results for learning Nash, CE, and CCE in general-sum Markov games; however their sample complexity scales with i≤m A i due to the model-based nature of their algorithm. Algorithms for learning CE in extensive-form games has been studied in (Celli et al, 2020), though we remark Markov games and extensive-form games are different frameworks and our results do not imply each other.…”

Section: Related Workcontrasting

confidence: 72%

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Song¹,

Song²,

Bai³

2021

Preprint

View full text Add to dashboard Cite

Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially in the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates what learning goals admit better sample complexities in the setting of m-player general-sum Markov games with H steps, S states, and A i actions per player. First, we design algorithms for learning an ε-Coarse Correlated Equilibrium (CCE) in O(H 5 S max i≤m A i /ε 2 ) episodes, and an ε-Correlated Equilibrium (CE) in O(H 6 S max i≤m A 2 i /ε 2 ) episodes. This is the first line of results for learning CCE and CE with sample complexities polynomial in max i≤m A i . Our algorithm for learning CE integrates an adversarial bandit subroutine which minimizes a weighted swap regret, along with several novel designs in the outer loop. Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an ε-approximate Nash equilibrium within O(S i≤m A i /ε 3 ) episodes (when only highlighting the dependence on S, A i , and ε), which only depends linearly in i≤m A i and significantly improves over existing efficient algorithms in the ε dependence. Overall, our results shed light on what equilibria or structural assumptions on the game may enable sample-efficient learning with many players.

show abstract

Section: Related Workcontrasting

confidence: 72%

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Song¹,

Song²,

Bai³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The theorem instead implies that the historical average of the CFR policies π t , which is the policy returned by the DREAM algorithm, converges to a Nash equilibrium at this rate in two-player zero-sum games. In n-player general-sum games, it also converges to an extensive-form coarse correlated equilibrium at this rate [8].…”

Section: D2 Bound On Dream's Regretmentioning

confidence: 97%

DREAM: Deep Regret minimization with Advantage baselines and Model-free learning

Steinberger¹,

Lerer²,

Brown³

2020

Preprint

View full text Add to dashboard Cite

We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash Equilibrium in two-player zero-sum games and to an extensive-form coarse correlated equilibrium in all other games. Our primary innovation is an effective algorithm that, in contrast to other regret-based deep learning algorithms, does not require access to a perfect simulator of the game to achieve good performance. We show that DREAM empirically achieves state-of-the-art performance among model-free algorithms in popular benchmark games, and is even competitive with algorithms that do use a perfect simulator.

show abstract

“…Later, Farina et al (2019) propose a min-max optimization formulation of EFCEs which can be solved by first-order methods. Celli et al (2020) and its extended version (Farina et al, 2021a) design the first uncoupled no-regret algorithm for computing EFCEs. Their algorithms are based on minimizing the trigger regret (first considered in Dudik and Gordon (2012); Gordon et al (2008)) via counterfactual regret decomposition (Zinkevich et al, 2007).…”

Section: Related Workmentioning

confidence: 99%

“…Equivalence between 1-EFCE and trigger definition of EFCE At the special case K = 1, our (exact) 1-EFCE is equivalent to the existing definition of EFCE based on trigger policies (Gordon et al, 2008;Celli et al, 2020), which defines an ε-approximate EFCE as any correlated policy π such that the following trigger gap is at most ε:…”

Section: Properties Of K-efcementioning

confidence: 99%

“…An EFCE is a correlated policy that can be thought of as a "mediator" of the game who recommends actions to each player privately and sequentially (at visited information sets), in a way that disincentivizes any player to deviate from the recommended actions. Polynomial-time algorithms for computing EFCEs have been established, by formulating as a linear program and using the ellipsoid method (Huang and von Stengel, 2008;Papadimitriou and Roughgarden, 2008;Jiang and Leyton-Brown, 2015), min-max optimization (Farina et al, 2019), or uncoupled no-regret dynamics using variants of counterfactual regret minimization (Celli et al, 2020;Farina et al, 2021a;Morrill et al, 2021;Anagnostides et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

Song¹,

Song²

2022

Preprint

View full text Add to dashboard Cite

Imperfect-Information Extensive-Form Games (IIEFGs) is a prevalent model for real-world games involving imperfect information and sequential plays. The Extensive-Form Correlated Equilibrium (EFCE) has been proposed as a natural solution concept for multi-player general-sum IIEFGs. However, existing algorithms for finding an EFCE require full feedback from the game, and it remains open how to efficiently learn the EFCE in the more challenging bandit feedback setting where the game can only be learned by observations from repeated playing.This paper presents the first sample-efficient algorithm for learning the EFCE from bandit feedback. We begin by proposing K-EFCE-a more generalized definition that allows players to observe and deviate from the recommended actions for K times. The K-EFCE includes the EFCE as a special case at K = 1, and is an increasingly stricter notion of equilibrium as K increases. We then design an uncoupled noregret algorithm that finds an ε-approximate K-EFCE within O(maxi XiA K i /ε 2 ) iterations in the full feedback setting, where Xi and Ai are the number of information sets and actions for the i-th player. Our algorithm works by minimizing a wide-range regret at each information set that takes into account all possible recommendation histories. Finally, we design a sample-based variant of our algorithm that learns an ε-approximate K-EFCE within O(maxi XiA K+1 i /ε 2 ) episodes of play in the bandit feedback setting. When specialized to K = 1, this gives the first sample-efficient algorithm for learning EFCE from bandit feedback.

show abstract

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Cited by 3 publications

References 7 publications

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

DREAM: Deep Regret minimization with Advantage baselines and Model-free learning

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

Contact Info

Product

Resources

About