Learning in Congestion Games with Bandit Feedback

Cui, Qiwen; Xiong, Zhihan; Fazel, Maryam; Du, Simon S.

doi:10.48550/arxiv.2206.01880

Cited by 2 publications

(3 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, Dou et al (2022) provide sample complexity results for the centralized and decentralized problem settings under both semi-bandit and bandit feedback. Similarly, Cui et al (2022) study the regret of the Nash Q-learning algorithm for two-player turn based Markov games, and introduce the first gap dependent logarithmic upper bounds, which match theoretical lower bounds up to log factors, in the episodic tabular setting.…”

Section: Related Workmentioning

confidence: 99%

Instance-dependent Sample Complexity Bounds for Zero-sum Matrix Games

Maiti¹,

Jamieson²,

Ratliff³

2023

Preprint

View full text Add to dashboard Cite

We study the sample complexity of identifying an approximate equilibrium for two-player zero-sum n × 2 matrix games. That is, in a sequence of repeated game plays, how many rounds must the two players play before reaching an approximate equilibrium (e.g., Nash)? We derive instance-dependent bounds that define an ordering over game matrices that captures the intuition that the dynamics of some games converge faster than others. Specifically, we consider a stochastic observation model such that when the two players choose actions i and j, respectively, they both observe each other's played actions and a stochastic observationTo our knowledge, our work is the first case of instance-dependent lower bounds on the number of rounds the players must play before reaching an approximate equilibrium in the sense that the number of rounds depends on the specific properties of the game matrix A as well as the desired accuracy. We also prove a converse statement: there exist player strategies that achieve this lower bound.

show abstract

Section: Related Workmentioning

confidence: 99%

Instance-dependent Sample Complexity Bounds for Zero-sum Matrix Games

Maiti¹,

Jamieson²,

Ratliff³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In this part we will show the Nash-UCB algorithm proposed in Cui et al [2022] satisfies Assumption 1. We carry out the proof by pointing out the modifications we need to make in their proof.…”

Section: B3 Proof For Theoremmentioning

confidence: 99%

“…This versatile approach 1.2 Related Work (Stationary) Multi-agent reinforcement learning. Numerous works have been devoted to learning equilibria in (stationary) multi-agent systems, including zero-sum Markov games [Bai et al, 2020, general-sum Markov games , Mao et al, 2022, Song et al, 2021, Daskalakis et al, 2022, Wang et al, 2023, Cui et al, 2023, Markov potential games [Leonardos et al, 2021, Song et al, 2021, Ding et al, 2022, Cui et al, 2023, congestion games [Cui et al, 2022], extensive-form games [Kozuno et al, 2021, and partially observable Markov games [Liu et al, 2022]. These works aim to learn equilibria with bandit feedback efficiently, measured by either regret or sample complexity.…”

Section: Introductionmentioning

confidence: 99%

Halftoning with Multi-Agent Deep Reinforcement Learning

Jiang

Xiong

Jiang

et al. 2022

2022 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Deep neural networks have recently succeeded in digital halftoning using vanilla convolutional layers with high parallelism. However, existing deep methods fail to generate halftones with a satisfying blue-noise property and require complex training schemes. In this paper, we propose a halftoning method based on multi-agent deep reinforcement learning, called HALFTONERS, which learns a shared policy to generate high-quality halftone images. Specifically, we view the decision of each binary pixel value as an action of a virtual agent, whose policy is trained by a low-variance policy gradient. Moreover, the blue-noise property is achieved by a novel anisotropy suppressing loss function. Experiments show that our halftoning method produces high-quality halftones while staying relatively fast.

show abstract

Learning in Congestion Games with Bandit Feedback

Cited by 2 publications

References 17 publications

Instance-dependent Sample Complexity Bounds for Zero-sum Matrix Games

Instance-dependent Sample Complexity Bounds for Zero-sum Matrix Games

Halftoning with Multi-Agent Deep Reinforcement Learning

Contact Info

Product

Resources

About