Proceedings of the 23rd International Conference on Machine Learning - ICML '06 2006
DOI: 10.1145/1143844.1143963
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic inference for solving discrete and continuous state Markov Decision Processes

Abstract: Inference in Markov DecisionProcesses has recently received interest as a means to infer goals of an observed action, policy recognition, and also as a tool to compute policies. A particularly interesting aspect of the approach is that any existing inference technique in DBNs now becomes available for answering behavioral questions-including those on continuous, factorial, or hierarchical state representations. Here we present an Expectation Maximization algorithm for computing optimal policies. Unlike previou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
174
0
3

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 190 publications
(177 citation statements)
references
References 5 publications
0
174
0
3
Order By: Relevance
“…A first step in this direction was already made in Wiegerinck et al (2006), van den Broek et al (2008a). In this case, we have considered the KL-stag-hunt game and shown that BP provides a good approximation and allows to analyze the behavior of large systems, where exact inference is not feasible.…”
Section: Discussionmentioning
confidence: 86%
See 2 more Smart Citations
“…A first step in this direction was already made in Wiegerinck et al (2006), van den Broek et al (2008a). In this case, we have considered the KL-stag-hunt game and shown that BP provides a good approximation and allows to analyze the behavior of large systems, where exact inference is not feasible.…”
Section: Discussionmentioning
confidence: 86%
“…The KL control approach proposed in this paper also bears some relation to the EM approach of Toussaint and Storkey (2006), who consider the discounted reward case with 0, 1 rewards. The posterior can be considered a mixture over times at which rewards are incorporated.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This generalized the work of Cooper and Shachter to the case of infinite horizons, and cost functions over future states. More recently, this approach has been pursued by applying Bayesian procedures (or minimising Kullback-Leibler divergences) to problems of optimal decision making in MDPs Botvinick and An 2008;Hoffman et al 2009;Toussaint et al 2008).…”
Section: Optimal Control As Inferencementioning
confidence: 99%
“…Following [15], it is possible to move the expectation inside the summation and rewrite the expected utility as…”
Section: Sequential Decision Makingmentioning
confidence: 99%