2021
DOI: 10.48550/arxiv.2106.03352
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

Chi Jin,
Qinghua Liu,
Tiancheng Yu

Abstract: Modern reinforcement learning (RL) commonly engages practical problems with large state spaces, where function approximation must be deployed to approximate either the value function or the policy. While recent progresses in RL theory address a rich set of RL problems with general function approximation, such successes are mostly restricted to the single-agent setting. It remains elusive how to extend these results to multi-agent RL, especially in the face of new game-theoretical challenges. This paper conside… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
25
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 11 publications
(26 citation statements)
references
References 35 publications
1
25
0
Order By: Relevance
“…In this section, we focus our attention on theoretical results for the tabular setting, where the numbers of states and actions are finite. We acknowledge that there has been much recent work in RL for continuous state spaces [see, e.g., 21,23,56,24,55,25], but this setting is beyond our scope.…”
Section: Related Workmentioning
confidence: 99%
“…In this section, we focus our attention on theoretical results for the tabular setting, where the numbers of states and actions are finite. We acknowledge that there has been much recent work in RL for continuous state spaces [see, e.g., 21,23,56,24,55,25], but this setting is beyond our scope.…”
Section: Related Workmentioning
confidence: 99%
“…Most of these work focus on tabular setting [WHL17, PSPP17, BJ20, BJWX21, BJY20, ZKBY20, ZTLD21] or linear function approximation settings [XCWY20]. Moreover, the concurrent work of [JLY21] studies competitive RL with general function approximation. In particular, under the decoupled setting where the learner only controls P1, [JLY21] proposes a provably sample efficient and model-free algorithm based on successive hypothesis elimination and the principle of optimism in the face of uncertainty [LS18].…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, the concurrent work of [JLY21] studies competitive RL with general function approximation. In particular, under the decoupled setting where the learner only controls P1, [JLY21] proposes a provably sample efficient and model-free algorithm based on successive hypothesis elimination and the principle of optimism in the face of uncertainty [LS18]. The proposed algorithm achieves a sublinear O( √ K) regret for MDPs with low minimax Eluder dimensions.…”
Section: Related Workmentioning
confidence: 99%
“…In the online setting where the opponent is arbitrary, (Xie et al 2020;Jin, Liu, and Yu 2021) achieve a regret bound of O( √ T ) in the finite-horizon SGs with linear and general function approximation, respectively. However, in the applications where the interaction between the players and the environment is non-stopping (e.g., stock trading), the infinitehorizon SG is more suitable.…”
Section: Related Literaturementioning
confidence: 99%