Ziang Song scite author profile

Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially in the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates what learning goals admit better sample complexities in the setting of m-player general-sum Markov games with H steps, S states, and A i actions per player. First, we design algorithms for learning an ε-Coarse Correlated Equilibrium (CCE) in O(H 5 S max i≤m A i /ε 2 ) episodes, and an ε-Correlated Equilibrium (CE) in O(H 6 S max i≤m A 2 i /ε 2 ) episodes. This is the first line of results for learning CCE and CE with sample complexities polynomial in max i≤m A i . Our algorithm for learning CE integrates an adversarial bandit subroutine which minimizes a weighted swap regret, along with several novel designs in the outer loop. Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an ε-approximate Nash equilibrium within O(S i≤m A i /ε 3 ) episodes (when only highlighting the dependence on S, A i , and ε), which only depends linearly in i≤m A i and significantly improves over existing efficient algorithms in the ε dependence. Overall, our results shed light on what equilibria or structural assumptions on the game may enable sample-efficient learning with many players.

show abstract

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

Song¹,

Song²

2022

Preprint

View full text Add to dashboard Cite

Imperfect-Information Extensive-Form Games (IIEFGs) is a prevalent model for real-world games involving imperfect information and sequential plays. The Extensive-Form Correlated Equilibrium (EFCE) has been proposed as a natural solution concept for multi-player general-sum IIEFGs. However, existing algorithms for finding an EFCE require full feedback from the game, and it remains open how to efficiently learn the EFCE in the more challenging bandit feedback setting where the game can only be learned by observations from repeated playing.This paper presents the first sample-efficient algorithm for learning the EFCE from bandit feedback. We begin by proposing K-EFCE-a more generalized definition that allows players to observe and deviate from the recommended actions for K times. The K-EFCE includes the EFCE as a special case at K = 1, and is an increasingly stricter notion of equilibrium as K increases. We then design an uncoupled noregret algorithm that finds an ε-approximate K-EFCE within O(maxi XiA K i /ε 2 ) iterations in the full feedback setting, where Xi and Ai are the number of information sets and actions for the i-th player. Our algorithm works by minimizing a wide-range regret at each information set that takes into account all possible recommendation histories. Finally, we design a sample-based variant of our algorithm that learns an ε-approximate K-EFCE within O(maxi XiA K+1 i /ε 2 ) episodes of play in the bandit feedback setting. When specialized to K = 1, this gives the first sample-efficient algorithm for learning EFCE from bandit feedback.

show abstract

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Feng¹,

Li²,

Song³

et al. 2022

Preprint

View full text Add to dashboard Cite

The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of document-level coherence. Some recent research tried to mitigate this issue by introducing an additional context encoder or translating with multiple sentences or even the entire document. Such methods may lose the information on the target side or have an increasing computational complexity as documents get longer. To address such problems, we introduce a recurrent memory unit to the vanilla Transformer, which supports the information exchange between the sentence and previous context. The memory unit is recurrently updated by acquiring information from sentences, and passing the aggregated knowledge back to subsequent sentence states. We follow a two-stage training strategy, in which the model is first trained at the sentence level and then finetuned for document-level translation. We conduct experiments on three popular datasets for document-level machine translation and our model has an average improvement of 0.91 s-BLEU over the sentence-level baseline. We also achieve state-of-the-art results on TED and News, outperforming the previous work by 0.36 s-BLEU and 1.49 d-BLEU on average.

show abstract

Fast electrochemical impedance spectroscopy of lithium-ion batteries based on the large square wave excitation signal

Wang¹,

Song²,

Zhu³

et al. 2023

iScience

View full text Add to dashboard Cite

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Feng¹,

Li²,

Song³

et al. 2022

View full text Add to dashboard Cite

show abstract

Satellite-based phase-matching quantum key distribution

Cui

Song

Huang

et al. 2022

Quantum Inf Process

View full text Add to dashboard Cite

Fast protection strategy for monopole grounding fault of low-voltage DC microgrid

Peng

Song²,

Zeng³

et al. 2023

Electric Power Systems Research

View full text Add to dashboard Cite

Efficient $Φ$-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Jin¹,

Song²,

Song³

et al. 2022

Preprint

View full text Add to dashboard Cite

A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the Φ-Hedge algorithm-A generic algorithm capable of learning a large class of equilibria for NFGs. We show that Φ-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the Φ-Hedge algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with suitable dilated regularizers, and run in polynomial time. This new connection further allows us to design and analyze a new class of OMD algorithms based on modifying its log-partition function. In particular, we design an improved algorithm with balancing techniques that achieves a sharp O( √ XAT ) EFCE-regret under bandit-feedback in an EFG with X information sets, A actions, and T episodes. To our best knowledge, this is the first such rate and matches the information-theoretic lower bound.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ziang Song

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Fast electrochemical impedance spectroscopy of lithium-ion batteries based on the large square wave excitation signal

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Satellite-based phase-matching quantum key distribution

Fast protection strategy for monopole grounding fault of low-voltage DC microgrid

Efficient $Φ$-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Contact Info

Product

Resources

About