2019 IEEE 58th Conference on Decision and Control (CDC) 2019
DOI: 10.1109/cdc40024.2019.9029555
|View full text |Cite
|
Sign up to set email alerts
|

Entropy-Regularized Stochastic Games

Abstract: In two-player zero-sum stochastic games, where two competing players make decisions under uncertainty, a pair of optimal strategies is traditionally described by Nash equilibrium and computed under the assumption that the players have perfect information about the stochastic transition model of the environment. However, implementing such strategies may make the players vulnerable to unforeseen changes in the environment. In this paper, we introduce entropy-regularized stochastic games where each player aims to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 28 publications
0
7
0
Order By: Relevance
“…Entropy regularization. The advantages of entropy regularization have been exploited in a diverse array of optimization problems over probability distributions, with prominent examples including equilibrium computation in game theory (Ao et al, 2023;Cen et al, 2022aCen et al, , 2021McKelvey and Palfrey, 1995;Mertikopoulos and Sandholm, 2016;Savas et al, 2019) and policy optimization in reinforcement learning (Cen et al, 2022b(Cen et al, , 2023Geist et al, 2019;Lan, 2022;Mei et al, 2020;Neu et al, 2017;Zhan et al, 2023). The idea of employing entropy regularization to speed up convergence in optimal transport has been studied for multiple decades (e.g., Altschuler (2022); Chakrabarty and Khanna (2021); Kalantari et al (2008); Knight (2008)) and recently popularized by Cuturi (2013).…”
Section: Related Workmentioning
confidence: 99%
“…Entropy regularization. The advantages of entropy regularization have been exploited in a diverse array of optimization problems over probability distributions, with prominent examples including equilibrium computation in game theory (Ao et al, 2023;Cen et al, 2022aCen et al, , 2021McKelvey and Palfrey, 1995;Mertikopoulos and Sandholm, 2016;Savas et al, 2019) and policy optimization in reinforcement learning (Cen et al, 2022b(Cen et al, , 2023Geist et al, 2019;Lan, 2022;Mei et al, 2020;Neu et al, 2017;Zhan et al, 2023). The idea of employing entropy regularization to speed up convergence in optimal transport has been studied for multiple decades (e.g., Altschuler (2022); Chakrabarty and Khanna (2021); Kalantari et al (2008); Knight (2008)) and recently popularized by Cuturi (2013).…”
Section: Related Workmentioning
confidence: 99%
“…Motivated by the algorithmic role of entropy regularization in single-agent RL (Neu et al, 2017;Geist et al, 2019;Cen et al, 2020) as well as its wide use in game theory to account for imperfect and noisy information (McKelvey and Palfrey, 1995;Savas et al, 2019), we initiate the design and analysis of extragradient algorithms using multiplicative updates for finding the so-called quantal response equilibrium (QRE), which are solutions to competitive games with entropy regularization (McKelvey and Palfrey, 1995). While finding QRE is of interest in its own right, by controlling the knob of entropy regularization, the QRE provides a close approximation to the Nash equilibrium (NE), and in turn acts as a smoothing scheme for finding the NE.…”
Section: Our Contributionsmentioning
confidence: 99%
“…In single-agent RL, the role of entropy regularization as an algorithmic mechanism to encourage exploration and accelerate convergence has been investigated extensively (Neu et al, 2017;Geist et al, 2019;Mei et al, 2020;Cen et al, 2020;Lan, 2021;Zhan et al, 2021). Turning to the game setting, entropy regularization is used to account for imperfect information in the seminal work of McKelvey and Palfrey (1995) that introduced the QRE, and a few representative works on entropy and more general regularizations in games include Savas et al (2019); Hofbauer and Sandholm (2002); Mertikopoulos and Sandholm (2016).…”
Section: Related Workmentioning
confidence: 99%
“…In Tang et al (2021) the properties of the associated exploratory Hamilton-Jacobi-Bellman (HJB) equations are studied as is the rate of convergence of the optimal exploratory control as exploration goes to zero. Entropy regularization has also been employed in the context of Markov decision processes, as in Neu et al (2017) and Geist et al (2019), or other areas related to optimization such as stochastic games as in Savas et al (2019), Guan et al (2020), andHao et al (2022).…”
Section: Introductionmentioning
confidence: 99%