Robotics: Science and Systems XVII 2021
DOI: 10.15607/rss.2021.xvii.023
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Neural Dynamic Policies

Abstract: We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(26 citation statements)
references
References 25 publications
(33 reference statements)
0
26
0
Order By: Relevance
“…State representation [24], [53], [86], [87], [97] [22], [55], [138] Reward design [38], [119] [18] [93] [135] [23], [31], [33], [54] Abstract learning [27] [106], [107] [3], [16], [82], [134], [136] Offline RL [26] [1], [20], [39], [63], [116], [133], [140] Parallel learning [48], [114] [11], [32], [44], [58], [79], [80], [88], [113] Learning from demonstration [7], [35] [19]…”
Section: Guided Rl Methods Sourcementioning
confidence: 99%
See 3 more Smart Citations
“…State representation [24], [53], [86], [87], [97] [22], [55], [138] Reward design [38], [119] [18] [93] [135] [23], [31], [33], [54] Abstract learning [27] [106], [107] [3], [16], [82], [134], [136] Offline RL [26] [1], [20], [39], [63], [116], [133], [140] Parallel learning [48], [114] [11], [32], [44], [58], [79], [80], [88], [113] Learning from demonstration [7], [35] [19]…”
Section: Guided Rl Methods Sourcementioning
confidence: 99%
“…Other approaches first train a teacher policy on unchanging task setups via, e.g., RL and distill a policy capable of interpolating among different task setups, such as [7], which chooses neural dynamical policies [8] to present the teacher and students. Others learn teacher policies on true state information to then derive a student policy conditioned on a reduced or substituted input space, e.g., Chen et al [21], whose final vision-based policies are able to reorient objects in the shadow hand domain, and Lee et al [68], who also distill a vision-based policy and test it in their red-green-blue-stacking benchmark.…”
Section: Learning From Demonstrationmentioning
confidence: 99%
See 2 more Smart Citations
“…Within hierarchical RL, there also exists a class of compositional methods in which a high-level policy issues commands to be executed by a low-level policy. The commands can be specified by learned latent representations [32], [33], [34]), low-level control parameters [35], [36], [37], [38], or subgoals, which can be hand-designed [39], task-specific locations [40], [41], [42], [43], [44], [45], [46], target object states [47], [48], [49]), or target states in a learned latent space [50], [51], [52]. In contrast to these works, our multi-frequency system uses no explicit skills, commands, or subgoals.…”
Section: Related Workmentioning
confidence: 99%