2009
DOI: 10.1613/jair.2667
|View full text |Cite
|
Sign up to set email alerts
|

Policy Iteration for Decentralized Control of Markov Decision Processes

Abstract: Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration alg… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
77
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 73 publications
(82 citation statements)
references
References 16 publications
(13 reference statements)
0
77
0
Order By: Relevance
“…We note that EM is not guaranteed to converge to a global optimum. However, in the experiments we show that EM almost always achieves similar values as the NLP based solver to optimize FSCs (Amato et al, 2010) and much better than DEC-BPI (Bernstein et al, 2009). Key potential advantages of using EM lie in its ability to easily generalize to much richer representations than currently possible for Dec-POMDPs such as hierarchical controllers (Toussaint et al, 2008), and continuous state and action spaces (Hoffman et al, 2009b).…”
Section: Policy Optimization Via Expectation Maximizationmentioning
confidence: 68%
See 3 more Smart Citations
“…We note that EM is not guaranteed to converge to a global optimum. However, in the experiments we show that EM almost always achieves similar values as the NLP based solver to optimize FSCs (Amato et al, 2010) and much better than DEC-BPI (Bernstein et al, 2009). Key potential advantages of using EM lie in its ability to easily generalize to much richer representations than currently possible for Dec-POMDPs such as hierarchical controllers (Toussaint et al, 2008), and continuous state and action spaces (Hoffman et al, 2009b).…”
Section: Policy Optimization Via Expectation Maximizationmentioning
confidence: 68%
“…In terms of solution representation, most algorithms for infinite-horizon problems represent agent policies as finite-state controllers (Amato, Bernstein, & Zilberstein, 2010;Bernstein, Amato, Hansen, & Zilberstein, 2009), unlike algorithms for finite-horizon problems that often use policy trees (Hansen, Bernstein, & Zilberstein, 2004). The resulting solution is approximate because of the limited memory of the controllers and because optimizing the action selection and transition parameters is extremely hard.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…On the other side, finite state controllers (FSCs) is a major model to represent infinite-horizon Dec-POMDP policy. Several optimization techniques have been used to approximate the parameters of FSCs, for example, Linear programming [7], nonlinear programming [8] and expectation-maximization [9], [10]. However, identifying best situations for communication has not been considered by existing methods.…”
Section: Introductionmentioning
confidence: 99%