2020
DOI: 10.48550/arxiv.2002.07066
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Abstract: We develop provably efficient reinforcement learning algorithms for two-player zero-sum Markov games in which the two players simultaneously take actions. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and the goal is to find the Nash Equilibrium efficiently by minimizing the worst-case duali… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
41
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(42 citation statements)
references
References 60 publications
1
41
0
Order By: Relevance
“…We propose an algorithm that incurs at most O(H √ d E K log N F ) 1 regret in K episodes where d E denotes the Minimax Eluder dimension N F denotes the covering number of function class. As a special case, this result improves [XCWY20] by a √ d multiplicative factor when the reward function and transition kernel are linearly parameterized and d is the dimension of feature mapping.…”
Section: Introductionmentioning
confidence: 86%
See 1 more Smart Citation
“…We propose an algorithm that incurs at most O(H √ d E K log N F ) 1 regret in K episodes where d E denotes the Minimax Eluder dimension N F denotes the covering number of function class. As a special case, this result improves [XCWY20] by a √ d multiplicative factor when the reward function and transition kernel are linearly parameterized and d is the dimension of feature mapping.…”
Section: Introductionmentioning
confidence: 86%
“…However, without strong sampling model or a well explored policy, the issue of exploration-exploitation tradeoff must addressed. Most of these work focus on tabular setting [WHL17, PSPP17, BJ20, BJWX21, BJY20, ZKBY20, ZTLD21] or linear function approximation settings [XCWY20]. Moreover, the concurrent work of [JLY21] studies competitive RL with general function approximation.…”
Section: Related Workmentioning
confidence: 99%
“…While several provable decentralized MARL algorithms have been developed [see, e.g., 57,40,13], they either have only asymptotic guarantees or work only under certain reachability assumptions (see Section 1.1). The existing provably efficient algorithms for general Markov games (without further assumptions) are exclusively centralized algorithms [2,55,30].…”
Section: Objectivementioning
confidence: 99%
“…In this section, we focus our attention on theoretical results for the tabular setting, where the numbers of states and actions are finite. We acknowledge that there has been much recent work in RL for continuous state spaces [see, e.g., 21,23,56,24,55,25], but this setting is beyond our scope.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation