2018
DOI: 10.48550/arxiv.1804.00645
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Universal Planning Networks

Abstract: A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization. To this end, we introduce universal planning networks (UPN). UPNs embed differentiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The plan-by-gradient-descent process and its underlying representations are learne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
62
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 34 publications
(62 citation statements)
references
References 35 publications
(49 reference statements)
0
62
0
Order By: Relevance
“…Additionally, we compare to two approaches that plan over a finite horizon by gradient decent. The first approach is based on the Universal Planning Network with a horizon of 5 (UPN) [11]. We also compare to an approach that extends UPN with neuromodulation for the integrated policy and dynamics model along with task embedding (TE-CPN).…”
Section: A Methods For Comparisonmentioning
confidence: 99%
See 4 more Smart Citations
“…Additionally, we compare to two approaches that plan over a finite horizon by gradient decent. The first approach is based on the Universal Planning Network with a horizon of 5 (UPN) [11]. We also compare to an approach that extends UPN with neuromodulation for the integrated policy and dynamics model along with task embedding (TE-CPN).…”
Section: A Methods For Comparisonmentioning
confidence: 99%
“…The idea can be extended by using a dynamics model to predict the latent state resulting from an action. Prior work [11], [15], [16], [17], [18] has demonstrated the value of using the dynamics model to unroll the policy over a planning horizon with the goal of minimizing the distance between the latent representations of the final predicted state and the goal.…”
Section: Planning By Backpropagationmentioning
confidence: 99%
See 3 more Smart Citations