Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Lin, Yiheng; Qu, Guannan; Huang, Longbo; Wierman, Adam

doi:10.48550/arxiv.2006.06555

Cited by 6 publications

(12 citation statements)

References 27 publications

(91 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Remark 10 The works in [Qu and Li, 2019, Qu et al, 2020a, Lin et al, 2020, Qu et al, 2020b are closely related to our contribution, as they also use decay of correlation assumptions to provably avoid the curse of dimensionality in MARL. Our contribution differs from these works in the following main ways:…”

Section: Decentralized Npgmentioning

confidence: 54%

“…To take advantage of the local structure of the network, Lin et al [2020] define a property regarding the dependence of Q π k (s, a) on the neighbors of k.…”

Section: Exponential Decaymentioning

confidence: 99%

“…Definition 4 (Lin et al [2020]) The (c, ψ)-exponential decay property for the Q-function holds if, for any agent k ∈ K and for any (s, a), ( s, a) ∈ S × A such that s…”

Section: Exponential Decaymentioning

confidence: 99%

“…These ideas have been applied to the MARL setting [Guestrin et al, 2001a, 2002, Sunehag et al, 2017, Rashid et al, 2018, Zhang et al, 2018a,b, Zhang and Zavlanos, 2019 and have proven successful in experiments, but lack theoretical guarantees or non-asymptotic analysis. A recent line of work has formally considered spatial decay of correlation assumptions for nearest-neighbors dynamics and designed decentralized algorithms based on policy gradient and actor-critic methods [Qu and Li, 2019, Qu et al, 2020a, Lin et al, 2020, Qu et al, 2020b, establishing non-asymptotic convergence guarantees towards a stationary point, but not towards an optimal policy. 1 An application of the same principles to the setting of mean-field MARL [Yang et al, 2018] can be found in Haotian Gu [2021], where the authors show that a neural network based version of the actor-critic algorithm can achieve global convergence.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

Alfano,

Rebeschini

2021

Preprint

View full text Add to dashboard Cite

Cooperative multi-agent reinforcement learning is a decentralized paradigm in sequential decision making where agents distributed over a network iteratively collaborate with neighbors to maximize global (network-wide) notions of rewards. Exact computations typically involve a complexity that scales exponentially with the number of agents. To address this curse of dimensionality, we design a scalable algorithm based on the Natural Policy Gradient framework that uses local information and only requires agents to communicate with neighbors within a certain range. Under standard assumptions on the spatial decay of correlations for the transition dynamics of the underlying Markov process and the localized learning policy, we show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity, incurring a localization error that does not depend on the number of agents and converges to zero exponentially fast as a function of the range of communication.

show abstract

Section: Decentralized Npgmentioning

confidence: 54%

“…To take advantage of the local structure of the network, Lin et al [2020] define a property regarding the dependence of Q π k (s, a) on the neighbors of k.…”

Section: Exponential Decaymentioning

confidence: 99%

“…Definition 4 (Lin et al [2020]) The (c, ψ)-exponential decay property for the Q-function holds if, for any agent k ∈ K and for any (s, a), ( s, a) ∈ S × A such that s…”

Section: Exponential Decaymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

Alfano,

Rebeschini

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We further focus on the case where the communications network is a structural component of the problem setting, as in (Lowe et al, 2017;Zhang et al, 2018). However, a separate but related body of works estimate the communications architecture when agents' behavior is fixed using graph neural networks (Ahilan & Dayan, 2020;Bachrach et al, 2020;Eccles et al, 2019) or statistical tests for correlation between agents' local utilities (Lin et al, 2020;Qu et al, 2020a).…”

Section: Additional Contextmentioning

confidence: 99%

Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

Koppel¹,

Bedi²,

Bhargav³

et al. 2021

Preprint

View full text Add to dashboard Cite

In tabular multi-agent reinforcement learning with average-cost criterion, a team of agents sequentially interacts with the environment and observes local incentives. We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability. To date, few global optimality guarantees exist even for this simple setting, as most results yield convergence to stationarity for parameterized policies in large/possibly continuous spaces. To solidify the foundations of MARL, we build upon linear programming (LP) reformulations, for which stochastic primal-dual methods yields a model-free approach to achieve optimal sample complexity in the centralized case. We develop multiagent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging. We establish that the sample complexity to obtain nearglobally optimal solutions matches tight dependencies on the cardinality of the state and action spaces, and exhibits classical scalings with respect to the network in accordance with multi-agent optimization. Experiments corroborate these results in practice. * denotes equal contributions.

show abstract

Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

Koppel

Bedi

Bhargav

et al. 2022

2022 IEEE 61st Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

In tabular multi-agent reinforcement learning with average-cost criterion, a team of agents sequentially interacts with the environment and observes local incentives. We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability. To date, few global optimality guarantees exist even for this simple setting, as most results yield convergence to stationarity for parameterized policies in large/possibly continuous spaces. To solidify the foundations of MARL, we build upon linear programming (LP) reformulations, for which stochastic primal-dual methods yield a model-free approach to achieve optimal sample complexity in the centralized case. We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging. We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces, and exhibits classical scalings with respect to the network in accordance with multi-agent optimization. Experiments corroborate these results in practice.

show abstract

Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Cited by 6 publications

References 27 publications

Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

Contact Info

Product

Resources

About