2005
DOI: 10.1088/1742-5468/2005/11/p11011
|View full text |Cite
|
Sign up to set email alerts
|

Path integrals and symmetry breaking for optimal control theory

Abstract: This paper considers linear-quadratic control of a non-linear dynamical system subject to arbitrary cost. I show that for this class of stochastic control problems the non-linear Hamilton-Jacobi-Bellman equation can be transformed into a linear equation. The transformation is similar to the transformation used to relate the classical Hamilton-Jacobi equation to the Schrödinger equation. As a result of the linearity, the usual backward computation can be replaced by a forward diffusion process, that can be comp… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
279
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 238 publications
(282 citation statements)
references
References 16 publications
3
279
0
Order By: Relevance
“…A brief version is provided for the reader's convenience; for the technical details we refer to, e.g., [36]. Let L u = " + (u rV (x)) · r denote the infinitesimal generator of (6). Here the superscript indicates the explicit dependence on the control variable.…”
Section: (10)mentioning
confidence: 99%
See 2 more Smart Citations
“…A brief version is provided for the reader's convenience; for the technical details we refer to, e.g., [36]. Let L u = " + (u rV (x)) · r denote the infinitesimal generator of (6). Here the superscript indicates the explicit dependence on the control variable.…”
Section: (10)mentioning
confidence: 99%
“…Here the superscript indicates the explicit dependence on the control variable. We have to show that the solutions to (10) yield optimal controls that maximize (8)-(9) subject to (6). Now choose a…”
Section: (10)mentioning
confidence: 99%
See 1 more Smart Citation
“…Additionally, we require both W and R to be positive definite and bounded everywhere on Ω, but otherwise impose no restrictions on them. Contrary to the assumptions in previous work [9,10,14] and the work of Kappen [7] and Broek et al [12] they are no longer required to relate to the inverse of eachother. As formulated, the control u and the noise w enters the state equation via the same matrix G. However, the problem can easily be reformulated such that the control and noise enter via different matrices as long as they have the same column space [14].…”
Section: Problem Formulationmentioning
confidence: 82%
“…In recent years, general reinforcement learning has yielded three kinds of policy search approaches that have translated particularly well into the domain of robotics: (i) policy gradients approaches based on likelihood-ratio estimation [Sutton et al, 1999], (ii) policy updates inspired by expectation-maximization [Toussaint et al, 2010], and (iii) the path integral methods [Kappen, 2005]. Likelihood-ratio policy gradient methods rely on perturbing the motor command instead of comparing in policy space.…”
Section: Policy Searchmentioning
confidence: 99%