Policy Iteration for Decentralized Control of Markov Decision Processes

Bernstein, Daniel S.; Amato, Christopher; Hansen, Eric A.; Zilberstein, Shlomo

doi:10.1613/jair.2667

Cited by 73 publications

(82 citation statements)

References 16 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We note that EM is not guaranteed to converge to a global optimum. However, in the experiments we show that EM almost always achieves similar values as the NLP based solver to optimize FSCs (Amato et al, 2010) and much better than DEC-BPI (Bernstein et al, 2009). Key potential advantages of using EM lie in its ability to easily generalize to much richer representations than currently possible for Dec-POMDPs such as hierarchical controllers (Toussaint et al, 2008), and continuous state and action spaces (Hoffman et al, 2009b).…”

Section: Policy Optimization Via Expectation Maximizationmentioning

confidence: 68%

“…In terms of solution representation, most algorithms for infinite-horizon problems represent agent policies as finite-state controllers (Amato, Bernstein, & Zilberstein, 2010;Bernstein, Amato, Hansen, & Zilberstein, 2009), unlike algorithms for finite-horizon problems that often use policy trees (Hansen, Bernstein, & Zilberstein, 2004). The resulting solution is approximate because of the limited memory of the controllers and because optimizing the action selection and transition parameters is extremely hard.…”

Section: Related Workmentioning

confidence: 99%

“…The resulting solution is approximate because of the limited memory of the controllers and because optimizing the action selection and transition parameters is extremely hard. Previous approaches to optimize finite-state controller based policies include decentralized bounded policy iteration (DEC-BPI) (Bernstein et al, 2009) and a technique based on non-linear programming (NLP) (Amato et al, 2010). The DEC-BPI algorithm uses a linear programming formula-tion to improve the parameters of one node of one finite-state controller at a time.…”

Section: Related Workmentioning

confidence: 99%

“…We compare our EM algorithm with the decentralized bounded policy iteration (DEC-BPI) algorithm (Bernstein et al, 2009) and a non-linear, non-convex optimization solver (NLP) (Amato et al, 2010). The DEC-BPI algorithm iteratively improves the parameters of a node using a linear program while keeping the other nodes' parameters fixed.…”

Section: Two Agents Dec-pomdpsmentioning

confidence: 99%

See 3 more Smart Citations

Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Kumar

Zilberstein

Toussaint

2015

jair

View full text Add to dashboard Cite

Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models-NEXP-Complete even for two agents-has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques.

show abstract

Section: Policy Optimization Via Expectation Maximizationmentioning

confidence: 68%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Two Agents Dec-pomdpsmentioning

confidence: 99%

See 2 more Smart Citations

Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Kumar

Zilberstein

Toussaint

2015

jair

View full text Add to dashboard Cite

show abstract

“…On the other side, finite state controllers (FSCs) is a major model to represent infinite-horizon Dec-POMDP policy. Several optimization techniques have been used to approximate the parameters of FSCs, for example, Linear programming [7], nonlinear programming [8] and expectation-maximization [9], [10]. However, identifying best situations for communication has not been considered by existing methods.…”

Section: Introductionmentioning

confidence: 99%