2017
DOI: 10.48550/arxiv.1712.00579
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Online Reinforcement Learning in Stochastic Games

Abstract: We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and one-step payoffs are determined simultaneously by a learner and an adversary. We propose the UCSG algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the diameter, which is an intrinsi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 7 publications
0
7
0
Order By: Relevance
“…Markov Game (MG), also known as stochastic game [42], is a popular model in multi-agent RL [32]. Early works have mainly focused on finding Nash equilibria of MGs with known transition and reward [33,21,20,53], or under strong reachability conditions such as simulators [52,23,43,63,54] where exploration is not needed. A recent line of works provide non-asymptotic guarantees for learning two-player zero-sum tabular MGs, in the setting that requires strategic exploration.…”
Section: Introductionmentioning
confidence: 99%
“…Markov Game (MG), also known as stochastic game [42], is a popular model in multi-agent RL [32]. Early works have mainly focused on finding Nash equilibria of MGs with known transition and reward [33,21,20,53], or under strong reachability conditions such as simulators [52,23,43,63,54] where exploration is not needed. A recent line of works provide non-asymptotic guarantees for learning two-player zero-sum tabular MGs, in the setting that requires strategic exploration.…”
Section: Introductionmentioning
confidence: 99%
“…In the online setting, there is a recent line of research that proposes provably efficient RL algorithms for zero-sum Markov games. See, e.g., Wei et al (2017) Compared with these aforementioned works, we focus on solving the Stackelberg-Nash equilibrium, which involves a bilevel structure and is fundamentally different from the Nash equilibrium. Thus, our work is not directly comparable.…”
Section: Related Workmentioning
confidence: 99%
“…Stochastic games (SGs) (Shapley 1953;Deng et al 2021) offer a multi-player game framework where agents jointly decide the loss and the state transition. Compared to OMDPs, the main difference is that SGs allow each player to have representation of states, actions and rewards, thus players can learn the representations over time and find the NE of the stochastic games (Wei, Hong, and Lu 2017;Tian et al 2020). The performance in SGs is often measured by the difference between the average loss and the value of the game (i.e., the value when both players play a NE), which is a weaker notion of regret compared to the best fixed policy in hindsight in OMDPs.…”
Section: Related Workmentioning
confidence: 99%
“…Intuitively, the player can learn the structure of the game (i.e., transition model, reward function) over time, thus on average, the player can calculate and compete with the value of the game. In non-episodic settings, the Upper Confidence Stochastic Game algorithm (UCSG) (Wei, Hong, and Lu 2017)…”
Section: Related Workmentioning
confidence: 99%