Robotics: Science and Systems VIII 2012
DOI: 10.15607/rss.2012.viii.045
|View full text |Cite
|
Sign up to set email alerts
|

On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference

Abstract: Abstract-We present a reformulation of the stochastic optimal control problem in terms of KL divergence minimisation, not only providing a unifying perspective of previous approaches in this area, but also demonstrating that the formalism leads to novel practical approaches to the control problem. Specifically, a natural relaxation of the dual formulation gives rise to exact iterative solutions to the finite and infinite horizon stochastic optimal control problem, while direct application of Bayesian inference… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
260
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 163 publications
(261 citation statements)
references
References 16 publications
1
260
0
Order By: Relevance
“…Related work on stochastic optimal control (Kappen 2005a,b;van den Broek et al 2008;Rawlik et al 2010), exploits the reduction of control problems to inference problems by appealing to variational techniques to provide efficient and computationally tractable solutions. In particular, formulating the problem in terms of Kullback-Leibler minimization (Kappen 2005a,b) and path integrals of cost functions using the Feynman-Kac formula (Theodorou et al 2010;Braun et al 2011).…”
Section: Optimal Control As Inferencementioning
confidence: 99%
“…Related work on stochastic optimal control (Kappen 2005a,b;van den Broek et al 2008;Rawlik et al 2010), exploits the reduction of control problems to inference problems by appealing to variational techniques to provide efficient and computationally tractable solutions. In particular, formulating the problem in terms of Kullback-Leibler minimization (Kappen 2005a,b) and path integrals of cost functions using the Feynman-Kac formula (Theodorou et al 2010;Braun et al 2011).…”
Section: Optimal Control As Inferencementioning
confidence: 99%
“…In this study, we use an information-theoretic model of bounded rational decision-making Braun, 2012, 2013;Braun and Ortega, 2014; that has precursors in the economic literature (McKelvey and Palfrey, 1995;Mattsson and Weibull, 2002;Sims, 2003Sims, , 2005Sims, , 2006Sims, , 2010Wolpert, 2006) and that is closely related to recent advances in the information theory of perception-action systems (Todorov, 2007(Todorov, , 2009Still, 2009;Friston, 2010;Peters et al, 2010;Tishby and Polani, 2011;Daniel et al, 2012Daniel et al, , 2013Kappen et al, 2012;Rawlik et al, 2012;Rubin et al, 2012;Neymotin et al, 2013;Tka膷ik and Bialek, 2014;Palmer et al, 2015). The basis of this approach is formalized by a free energy principle that trades off expected utility, and the cost of computation that is required to adapt the system accordingly in order to achieve high utility.…”
Section: Introductionmentioning
confidence: 99%
“…We will first review these two approaches and point out important differences to our approach. Subsequently, we will discuss SOC control algorithms that also use Kullback-Leibler divergence terms to determine the policy, e.g., dynamic policy programming [2], SOC by approximate inference [23] and path integral approaches [27,24]. We will also discuss the relation to approximate dynamic programming with soft-max operators [19] and existing policy search algorithms [8].…”
Section: Related Workmentioning
confidence: 99%
“…The goal of Stochastic Optimal Control (SOC), see [26,12,31,23], is to find the optimal control for a finite horizon of time steps. A common approach to solve the SOC problem is dynamic programming, i.e.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation