Revisiting CFR+ and Alternating Updates

Burch, Neil; Moravčík, Matej; Schmid, Martin

doi:10.1613/jair.1.11370

Cited by 20 publications

(10 citation statements)

References 4 publications

(5 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A well-known example is CFR by Zinkevich et al (2007) based on the regret-matching algorithm (Hart and Mas-Colell, 2000;Gordon, 2007). There exist many other variants of it, such as CFR+ (Tammelin, 2014;Burch et al, 2019), see also Farina et al (2019Farina et al ( , 2021a. These algorithms however only enjoy a (known) guarantee of convergence of order O((X √ A + Y √ B)/ √ T ).…”

Section: Introductionmentioning

confidence: 99%

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

Kozuno,

Ménard,

Munos

et al. 2021

Preprint

View full text Add to dashboard Cite

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular IIG under the perfect-recall assumption where the only feedback is realizations of the game (bandit feedback). In particular, the dynamics of the IIG is not known-we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on the convergence rate to the NE of order 1/ √ T where T is the number of played games. Moreover, IXOMD is computationally efficient as it needs to perform the updates only along the sampled trajectory. * Equal contribution Preprint. Under review.

show abstract

Section: Introductionmentioning

confidence: 99%

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

Kozuno,

Ménard,

Munos

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…A number of CFR variants have been proposed since the pioneering work Zinkevich et al (2008) for improving computational efficiency. For example, Lanctot et al (2009); Burch et al (2012); Gibson et al (2012); Lis ỳ et al (2015); Schmid et al (2019) combine CFR with Monte-Carlo sampling; Waugh et al (2015); Morrill (2016); propose to estimate the counterfactual value functions via regression; Brown and Sandholm (2015); ; improve efficiency by pruning suboptimal paths in the game tree; Tammelin (2014); Tammelin et al (2015); Burch et al (2019) analyze the performance of a modification named CFR + , and Zhou et al (2018) proposes lazy updates with a near-optimal regret upper bound.…”

Section: Policy-based Methodsmentioning

confidence: 99%

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Zhang¹,

Yang²,

Başar³

2019

Preprint

154

View full text Add to dashboard Cite

Recent years have witnessed significant advances in reinforcement learning (RL), which has registered great success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.Recent years have witnessed sensational advances of reinforcement learning (RL) in many prominent sequential decision-making problems, such as playing the game of Go (Silver

show abstract

“…Finally, to show a linear convergence rate, we need to show the counterpart of Eq. (11), which is again more involved compared to the normal-form game case. Indeed, we are only able to do so for DOMWU by making use of its closed-form update described in Lemma 2.…”

Section: Analysis Of Theorem 6 and Theoremmentioning

confidence: 99%

“…However, due to their ergodic convergence guarantee, theoretical convergence rates of regretminimization algorithms are typically limited to O(1/ √ T ) or O(1/T ) for T rounds, and this is also the case in practice [5,11]. In contrast, it is known that linear convergence rates are achievable for certain other first-order algorithms [44,22].…”

Section: Introductionmentioning

confidence: 99%

Last-iterate Convergence in Extensive-Form Games

Lee¹,

Kroer²,

Luo³

2021

Preprint

View full text Add to dashboard Cite

Regret-based algorithms are highly efficient at finding approximate Nash equilibria in sequential games such as poker games. However, most regret-based algorithms, including counterfactual regret minimization (CFR) and its variants, rely on iterate averaging to achieve convergence. Inspired by recent advances on lastiterate convergence of optimistic algorithms in zero-sum normal-form games, we study this phenomenon in sequential games, and provide a comprehensive study of last-iterate convergence for zero-sum extensive-form games with perfect recall (EFGs), using various optimistic regret-minimization algorithms over treeplexes. This includes algorithms using the vanilla entropy or squared Euclidean norm regularizers, as well as their dilated versions which admit more efficient implementation. In contrast to CFR, we show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast. We also provide experiments to further support our theoretical results.Preprint. Under review.

show abstract

Revisiting CFR+ and Alternating Updates

Cited by 20 publications

References 4 publications

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Last-iterate Convergence in Extensive-Form Games

Contact Info

Product

Resources

About