2021
DOI: 10.48550/arxiv.2112.13521
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

Abstract: We study multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers. In particular, we focus on the class of games where the followers are myopic, i.e., they aim to maximize their instantaneous rewards. For such a game, our goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair (π * , ν * ) such that (i) π * is the optimal policy for the leader when the followers always play their best response, and (ii) ν * is the be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 53 publications
0
4
0
Order By: Relevance
“…There are some papers considering the multi-agent in the sequential decision-making systems including the cooperative setting (Littman, 2001;González-Sánchez and Hernández-Lerma, 2013;Zhang et al, 2018;Perolat et al, 2018;Shi et al, 2022) and competing setting (Littman, 1994;Auer and Ortner, 2006;Zinkevich et al, 2007;Wei et al, 2017;Fiez et al, 2019;Jin et al, 2020). Zhong et al (2021) study the multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers and establish the efficient RL algorithms to achieve the Stackelberg-Nash equilibrium.…”
Section: Related Workmentioning
confidence: 99%
“…There are some papers considering the multi-agent in the sequential decision-making systems including the cooperative setting (Littman, 2001;González-Sánchez and Hernández-Lerma, 2013;Zhang et al, 2018;Perolat et al, 2018;Shi et al, 2022) and competing setting (Littman, 1994;Auer and Ortner, 2006;Zinkevich et al, 2007;Wei et al, 2017;Fiez et al, 2019;Jin et al, 2020). Zhong et al (2021) study the multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers and establish the efficient RL algorithms to achieve the Stackelberg-Nash equilibrium.…”
Section: Related Workmentioning
confidence: 99%
“…[86] proposed a gradient-decent algorithm to find NE for the Stackelberg bandit problem. [95] propose a value iteration method for solving the Stackelberg Markov game with convergence guarantee. [85] features differentiation through policy gradient on Sequential Decision Problems without global convergence guarantee.…”
Section: Related Workmentioning
confidence: 99%
“…(Pérolat et al 2017;Greenwald et al 2003) propose a Qlearning-like algorithm over general-sum Markov games, but do not apply FA and only consider stationary strategies which preclude strategies involving long range threats like the SSE. (Zinkevich, Greenwald, and Littman 2005) show a class of general-sum Markov games where value-iteration like methods will necessarily fail. (Zhong et al 2021) study reinforcement learning in the Stackelberg setting, but only consider followers with myopic best responses.…”
Section: Related Workmentioning
confidence: 99%