Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

Zhong, Han; Yang, Zhuoran; Wang, Zhaoran; Jordan, Michael I.

doi:10.48550/arxiv.2112.13521

Cited by 5 publications

(6 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are some papers considering the multi-agent in the sequential decision-making systems including the cooperative setting (Littman, 2001;González-Sánchez and Hernández-Lerma, 2013;Zhang et al, 2018;Perolat et al, 2018;Shi et al, 2022) and competing setting (Littman, 1994;Auer and Ortner, 2006;Zinkevich et al, 2007;Wei et al, 2017;Fiez et al, 2019;Jin et al, 2020). Zhong et al (2021) study the multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers and establish the efficient RL algorithms to achieve the Stackelberg-Nash equilibrium.…”

Section: Related Workmentioning

confidence: 99%

Double Matching Under Complementary Preferences

Li¹,

Cheng²,

Dai³

2023

Preprint

View full text Add to dashboard Cite

In this paper, we propose a new algorithm for addressing the problem of matching markets with complementary preferences, where agents' preferences are unknown a priori and must be learned from data. The presence of complementary preferences can lead to instability in the matching process, making this problem challenging to solve. To overcome this challenge, we formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling (MMTS) algorithm. The algorithm combines the strengths of Thompson Sampling for exploration with a double matching technique to achieve a stable matching outcome. Our theoretical analysis demonstrates the effectiveness of MMTS as it is able to achieve stability at every matching step, satisfies the incentive-compatibility property, and has a sublinear Bayesian regret over time. Our approach provides a useful method for addressing complementary preferences in real-world scenarios.

show abstract

Section: Related Workmentioning

confidence: 99%

Double Matching Under Complementary Preferences

Li¹,

Cheng²,

Dai³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…[86] proposed a gradient-decent algorithm to find NE for the Stackelberg bandit problem. [95] propose a value iteration method for solving the Stackelberg Markov game with convergence guarantee. [85] features differentiation through policy gradient on Sequential Decision Problems without global convergence guarantee.…”

Section: Related Workmentioning

confidence: 99%

Differentiable Arbitrating in Zero-sum Markov Games

Wang¹,

Song²,

Gao³

et al. 2023

Preprint

View full text Add to dashboard Cite

We initiate the study of how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating. Such a problem admits a bi-level optimization formulation. The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way. We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level. In particular, our method only requires a black-box solver for the (regularized) Nash equilibrium (NE). We develop the convergence analysis for the proposed framework with proper black-box NE solvers and demonstrate the empirical successes in two multi-agent reinforcement learning (MARL) environments.

show abstract

“…(Pérolat et al 2017;Greenwald et al 2003) propose a Qlearning-like algorithm over general-sum Markov games, but do not apply FA and only consider stationary strategies which preclude strategies involving long range threats like the SSE. (Zinkevich, Greenwald, and Littman 2005) show a class of general-sum Markov games where value-iteration like methods will necessarily fail. (Zhong et al 2021) study reinforcement learning in the Stackelberg setting, but only consider followers with myopic best responses.…”

Section: Related Workmentioning

confidence: 99%

Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games

Ling

Kolter

Fang

2023

AAAI

View full text Add to dashboard Cite

Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving general-sum extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no simple analogue to the state value function used in Markov Decision Processes and zero-sum games exists. In this paper, we propose learning the Enforceable Payoff Frontier (EPF)---a generalization of the state value function for general-sum games. We approximate the optimal Stackelberg extensive-form correlated equilibrium by representing EPFs with neural networks and training them by using appropriate backup operations and loss functions. This is the first method that applies FA to the Stackelberg setting, allowing us to scale to much larger games while still enjoying performance guarantees based on FA error. Additionally, our proposed method guarantees incentive compatibility and is easy to evaluate without having to depend on self-play or approximate best-response oracles.

show abstract

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

Cited by 5 publications

References 53 publications

Double Matching Under Complementary Preferences

Double Matching Under Complementary Preferences

Differentiable Arbitrating in Zero-sum Markov Games

Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games

Contact Info

Product

Resources

About