Robotics: Science and Systems X 2014
DOI: 10.15607/rss.2014.x.052
|View full text |Cite
|
Sign up to set email alerts
|

Combining the benefits of function approximation and trajectory optimization

Abstract: Abstract-Neural networks have recently solved many hard problems in Machine Learning, but their impact in control remains limited. Trajectory optimization has recently solved many hard problems in robotic control, but using it online remains challenging. Here we leverage the high-fidelity solutions obtained by trajectory optimization to speed up the training of neural network controllers. The two learning problems are coupled using the Alternating Direction Method of Multipliers (ADMM). This coupling enables t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
82
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 85 publications
(88 citation statements)
references
References 33 publications
(29 reference statements)
0
82
0
Order By: Relevance
“…where the covariance matrix has diagonal entries corresponding to the typical disturbance that the respective state component may encounter. The sampling idea is conceptually similar to fitting the tangent space of the demonstrator policy instead of just the nominal control command [16]. Unfortunately, despite our efforts to extract samples from MPC that cover a large volume in state space, there is still a bias of the state distribution towards those states that are encountered by the optimal MPC policy.…”
Section: Sampling From An Mpc Solutionmentioning
confidence: 99%
“…where the covariance matrix has diagonal entries corresponding to the typical disturbance that the respective state component may encounter. The sampling idea is conceptually similar to fitting the tangent space of the demonstrator policy instead of just the nominal control command [16]. Unfortunately, despite our efforts to extract samples from MPC that cover a large volume in state space, there is still a bias of the state distribution towards those states that are encountered by the optimal MPC policy.…”
Section: Sampling From An Mpc Solutionmentioning
confidence: 99%
“…Characteristic for most GPS approaches is that the trajectories and global policy are jointly optimized in an alternating fashion with a penalty on discrepancies. A deterministic variant was proposed by Mordatch and Todorov (2014). They also noted a connection to directly solving the deterministic trajectory optimization problem in Equation (3.9).…”
Section: Trajectory-guided Policy Learningmentioning
confidence: 99%
“…The constraints and the objectives will ensure that the behaviors of the system over the time horizon will satisfy the properties of interest, while optimizing some key performance metrics. A common solution to this problem lies in training a policy that "mimics" the input-output map of the MPC [14], [20]. Instead, our approach is based on two new ideas.…”
Section: Introductionmentioning
confidence: 99%