Hierarchical Neural Dynamic Policies

Bahl, Shikhar; Gupta, Abhinav; Pathak, Deepak

doi:10.15607/rss.2021.xvii.023

Cited by 21 publications

(26 citation statements)

References 25 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…State representation [24], [53], [86], [87], [97] [22], [55], [138] Reward design [38], [119] [18] [93] [135] [23], [31], [33], [54] Abstract learning [27] [106], [107] [3], [16], [82], [134], [136] Offline RL [26] [1], [20], [39], [63], [116], [133], [140] Parallel learning [48], [114] [11], [32], [44], [58], [79], [80], [88], [113] Learning from demonstration [7], [35] [19]…”

Section: Guided Rl Methods Sourcementioning

confidence: 99%

“…Other approaches first train a teacher policy on unchanging task setups via, e.g., RL and distill a policy capable of interpolating among different task setups, such as [7], which chooses neural dynamical policies [8] to present the teacher and students. Others learn teacher policies on true state information to then derive a student policy conditioned on a reduced or substituted input space, e.g., Chen et al [21], whose final vision-based policies are able to reorient objects in the shadow hand domain, and Lee et al [68], who also distill a vision-based policy and test it in their red-green-blue-stacking benchmark.…”

Section: Learning From Demonstrationmentioning

confidence: 99%

“…Second, simplifying the learning task by means of task-specific action spaces and hybrid modelbased and model-free approaches can improve the overall efficiency. Finally, training based on expert demonstrations tends to be rich in information and hence can accelerate policy training [7], [124], [131]. Furthermore, the efficiency can likely be improved by employing more instructive state representations, applying a curriculum to gradually tackle difficult learning tasks, and utilizing accurate simulation environments.…”

Section: Improving the Efficiencymentioning

confidence: 99%

“…Using Multiple Guided RL Methods First, we find that the guided RL-compliant papers tend to use a variety of guided RL methods. For instance, [7], [27], and [51] utilize at least three guided RL approaches, while [43], [70], and [82] deploy five or more approaches to obtain improvements in all three guided RL dimensions.…”

Section: Key Insights On Guided Rl Compliancementioning

confidence: 99%

See 3 more Smart Citations

Guided Reinforcement Learning: A Review and Evaluation for Efficient and Effective Real-World Robotics [Survey]

Eßer

Bach

Jestel

et al. 2023

IEEE Robot. Automat. Mag.

View full text Add to dashboard Cite

ecent successes aside, reinforcement learning (RL) still faces significant challenges in its application to the real-world robotics domain. Guiding the learning process with additional knowledge offers a potential solution, thus leveraging the strengths of data-and knowledge-driven approaches. However, this field of research encompasses several disciplines and hence would benefit from a structured overview.In this article, we propose a concept of guided RL that provides a systematic approach toward accelerating the training process and improving performance for real-world robotics settings. We introduce a taxonomy that structures guided RL approaches and shows how different sources of knowledge can be integrated into the learning pipeline in a practical way. Based on this, we describe available approaches in this field and quantitatively evaluate their specific impact in terms of efficiency, effectiveness, and sim-to-real transfer within the robotics domain. However, learning control policies in such a way naturally requires many interactions with the environment. This emphasizes the importance of both collecting highquality samples and exploring the search space in a sample-efficient manner. While directly learning on real robots is appealing, it comes along with substantial challenges, such as high sample cost, partial observability, and safety constraints [28]. Hence, simulators are often

show abstract

Section: Guided Rl Methods Sourcementioning

confidence: 99%

Section: Learning From Demonstrationmentioning

confidence: 99%

Section: Improving the Efficiencymentioning

confidence: 99%

Section: Key Insights On Guided Rl Compliancementioning

confidence: 99%

See 2 more Smart Citations

Guided Reinforcement Learning: A Review and Evaluation for Efficient and Effective Real-World Robotics [Survey]

Eßer

Bach

Jestel

et al. 2023

IEEE Robot. Automat. Mag.

View full text Add to dashboard Cite

show abstract

“…Within hierarchical RL, there also exists a class of compositional methods in which a high-level policy issues commands to be executed by a low-level policy. The commands can be specified by learned latent representations [32], [33], [34]), low-level control parameters [35], [36], [37], [38], or subgoals, which can be hand-designed [39], task-specific locations [40], [41], [42], [43], [44], [45], [46], target object states [47], [48], [49]), or target states in a learned latent space [50], [51], [52]. In contrast to these works, our multi-frequency system uses no explicit skills, commands, or subgoals.…”

Section: Related Workmentioning

confidence: 99%

Learning Pneumatic Non-Prehensile Manipulation with a Mobile Blower

Wu,

Sun,

Zeng

et al. 2022

Preprint

View full text Add to dashboard Cite

We investigate pneumatic non-prehensile manipulation (i.e., blowing) as a means of efficiently moving scattered objects into a target receptacle. Due to the chaotic nature of aerodynamic forces, a blowing controller must (i) continually adapt to unexpected changes from its actions, (ii) maintain finegrained control, since the slightest misstep can result in large unintended consequences (e.g., scatter objects already in a pile), and (iii) infer long-range plans (e.g., move the robot to strategic blowing locations). We tackle these challenges in the context of deep reinforcement learning, introducing a multi-frequency version of the spatial action maps framework. This allows for efficient learning of vision-based policies that effectively combine high-level planning and low-level closed-loop control for dynamic mobile manipulation. Experiments show that our system learns efficient behaviors for the task, demonstrating in particular that blowing achieves better downstream performance than pushing, and that our policies improve performance over baselines. Moreover, we show that our system naturally encourages emergent specialization between the different subpolicies spanning low-level fine-grained control and high-level planning. On a real mobile robot equipped with a miniature air blower, we show that our simulation-trained policies transfer well to a real environment and can generalize to novel objects.

show abstract