2014
DOI: 10.1007/978-3-662-44848-9_31
|View full text |Cite
|
Sign up to set email alerts
|

Policy Search for Path Integral Control

Abstract: Abstract. Path integral (PI) control defines a general class of control problems for which the optimal control computation is equivalent to an inference problem that can be solved by evaluation of a path integral over state trajectories. However, this potential is mostly unused in real-world problems because of two main limitations: first, current approaches can typically only be applied to learn openloop controllers and second, current sampling procedures are inefficient and not scalable to high dimensional s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
43
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(43 citation statements)
references
References 12 publications
0
43
0
Order By: Relevance
“…11. Learning curves for REPS and DREPS with the reward function in (14). Policy updates were calculated after every 50 rollouts.…”
Section: B Real Robot Multi-modal Problemmentioning
confidence: 99%
See 1 more Smart Citation
“…11. Learning curves for REPS and DREPS with the reward function in (14). Policy updates were calculated after every 50 rollouts.…”
Section: B Real Robot Multi-modal Problemmentioning
confidence: 99%
“…In the following section, we will detail how to build a clustered data structure for DREPS, followed by the algorithm's derivation, which is done similarly to REPS and other information-theoretic Policy Search approaches [14].…”
Section: Introductionmentioning
confidence: 99%
“…The unique traits of the LSOC framework have been exploited to derive a class of so called PIC methods. The interested reader is referred to earlier references [ 26 , 27 , 28 , 30 , 32 , 41 , 43 , 44 , 45 , 46 , 47 ]. An overview of applications was already given in the introduction.…”
Section: Path Integral Controlmentioning
confidence: 99%
“…Since a temporal logic reward (described in the next section) depends on the entire trajectory, it doesn't have the notion of cost-togo and can only be evaluated as a terminal reward. Therefore p(τ i ) (written short as p i ) is computed once and used for updates of all θ t (similar approach used in episodic PI-REPS [4]). The resulting update equations are…”
Section: B Relative Entropy Policy Searchmentioning
confidence: 99%