Model-free deep reinforcement learning algorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance. Model-based algorithms, in principle, can provide for much more efficient learning, but have proven difficult to extend to expressive, high-capacity models such as deep neural networks. In this work, we demonstrate that medium-sized neural network models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a modelbased reinforcement learning algorithm, producing stable and plausible gaits to accomplish various complex locomotion tasks. We also propose using deep neural network dynamics models to initialize a model-free learner, in order to combine the sample efficiency of model-based approaches with the high taskspecific performance of model-free methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure modelbased approach trained on just random action data can follow arbitrary trajectories with excellent sample efficiency, and that our hybrid algorithm can accelerate model-free learning on high-speed benchmark tasks, achieving sample efficiency gains of 3 − 5× on swimmer, cheetah, hopper, and ant agents. Videos can be found at https://sites.google.com/view/mbmf
Abstract-Model predictive control (MPC) is an effective method for controlling robotic systems, particularly autonomous aerial vehicles such as quadcopters. However, application of MPC can be computationally demanding, and typically requires estimating the state of the system, which can be challenging in complex, unstructured environments. Reinforcement learning can in principle forego the need for explicit state estimation and acquire a policy that directly maps sensor readings to actions, but is difficult to apply to unstable systems that are liable to fail catastrophically during training before an effective policy has been found. We propose to combine MPC with reinforcement learning in the framework of guided policy search, where MPC is used to generate data at training time, under full state observations provided by an instrumented training environment. This data is used to train a deep neural network policy, which is allowed to access only the raw observations from the vehicle's onboard sensors. After training, the neural network policy can successfully control the robot without knowledge of the full state, and at a fraction of the computational cost of MPC. We evaluate our method by learning obstacle avoidance policies for a simulated quadrotor, using simulated onboard sensors and no explicit state estimation at test time.
Enabling robots to autonomously navigate complex environments is essential for real-world deployment. Prior methods approach this problem by having the robot maintain an internal map of the world, and then use a localization and planning method to navigate through the internal map. However, these approaches often include a variety of assumptions, are computationally intensive, and do not learn from failures. In contrast, learning-based methods improve as the robot acts in the environment, but are difficult to deploy in the real-world due to their high sample complexity. To address the need to learn complex policies with few samples, we propose a generalized computation graph that subsumes value-based model-free methods and model-based methods, with specific instantiations interpolating between model-free and model-based. We then instantiate this graph to form a navigation model that learns from raw images and is sample efficient. Our simulated car experiments explore the design decisions of our navigation model, and show our approach outperforms single-step and N -step double Q-learning. We also evaluate our approach on a real-world RC car and show it can learn to navigate through a complex indoor environment with a few hours of fully autonomous, self-supervised training. Videos of the experiments and code can be found at github.com/gkahn13/gcg
We propose an information-theoretic planning approach that enables mobile robots to autonomously construct dense 3D maps in a computationally efficient manner. Inspired by prior work, we accomplish this task by formulating an information-theoretic objective function based on Cauchy-Schwarz quadratic mutual information (CSQMI) that guides robots to obtain measurements in uncertain regions of the map. We then contribute a two stage approach for active mapping. First, we generate a candidate set of trajectories using a combination of global planning and generation of local motion primitives. From this set, we choose a trajectory that maximizes the information-theoretic objective. Second, we employ a gradientbased trajectory optimization technique to locally refine the chosen trajectory such that the CSQMI objective is maximized while satisfying the robot's motion constraints. We evaluated our approach through a series of simulations and experiments on a ground robot and an aerial robot mapping unknown 3D environments. Real-world experiments suggest our approach reduces the time to explore an environment by 70% compared to a closest frontier exploration strategy and 57% compared to an information-based strategy that uses global planning, while simulations demonstrate the approach extends to aerial robots with higher-dimensional state. Global plans Local motion primitives (a) Global plans Local motion primitives (b) Global plans Local motion primitives Opt. Global plans Opt. Local motion primitives
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.