One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high degree-of-freedom (DOF) robotic tasks. In this contribution, we accomplish such variable impedance control with the reinforcement learning (RL) algorithm PI 2 (Policy Improvement with Path Integrals). PI 2 is a model-free, sampling-based learning method derived from first principles of stochastic optimal control. The PI 2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on the cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI 2 is that it can scale to problems of many DOFs, so that reinforcement learning on real robotic systems becomes feasible. We sketch the PI 2 algorithm and its theoretical properties, and how it is applied to gain scheduling for variable impedance control. We evaluate our approach by presenting results on several simulated and real robots. We consider tasks involving accurate tracking through via points, and manipulation tasks requiring physical contact with the environment. In these tasks, the optimal strategy requires both tuning of a reference trajectory and the impedance of the end-effector. The results show that we can use path integral based reinforcement learning not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.
In this paper we present a model predictive control algorithm designed for optimizing non-linear systems subject to complex cost criteria. The algorithm is based on a stochastic optimal control framework using a fundamental relationship between the information theoretic notions of free energy and relative entropy. The optimal controls in this setting take the form of a path integral, which we approximate using an efficient importance sampling scheme. We experimentally verify the algorithm by implementing it on a Graphics Processing Unit (GPU) and apply it to the problem of controlling a fifth-scale Auto-Rally vehicle in an aggressive driving task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.