“…In such cases, reinforcement learning and policy search algorithms that can learn from a robot's experience have been shown to be successful [8], [9] for tasks such as object manipulation [10], [11], [12], locomotion [13], [14], [15], [16] and flight [17]. However, most of this work involves using a model-free component to approximate features of the robot or the world that cannot be modeled while still using model-based controllers for other parts of the system [12], [18] In work where flexibility is taken into consideration, learning is still based either on building a more complex model [6], [19], [20], an approximate model [21] or plugging in a learned-model component into a model-based controller. Recently, work involving end-to-end model-free methods using deep reinforcement learning have been demonstrated successfully in rigid real robots [22], [23], [16].…”