Robotics: Science and Systems XIII 2017
DOI: 10.15607/rss.2017.xiii.048
|View full text |Cite
|
Sign up to set email alerts
|

Preparing for the Unknown: Learning a Universal Policy with Online System Identification

Abstract: Abstract-We present a new method of learning control policies that successfully operate under unknown dynamic models. We create such policies by leveraging a large number of training examples that are generated using a physical simulator. Our system is made of two components: a Universal Policy (UP) and a function for Online System Identification (OSI). We describe our control policy as universal because it is trained over a wide array of dynamic models. These variations in the dynamic model may include differ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
147
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 164 publications
(147 citation statements)
references
References 23 publications
0
147
0
Order By: Relevance
“…Several works have demonstrated improved performance with uncertain and complex dynamics using the reinforcement learning (RL) framework and training with randomized system parameters. In [5], the authors use a recurrent neural network to explicitly learn model parameters through real time interaction with an environment; these parameters are then used to augment the observation for a standard reinforcement learning algorithm. In [6], the authors use a recurrent policy and value function in a modified deep deterministic policy gradient algorithm to learn a policy for a robotic manipulator arm that uses real camera images as observations.…”
Section: Introductionmentioning
confidence: 99%
“…Several works have demonstrated improved performance with uncertain and complex dynamics using the reinforcement learning (RL) framework and training with randomized system parameters. In [5], the authors use a recurrent neural network to explicitly learn model parameters through real time interaction with an environment; these parameters are then used to augment the observation for a standard reinforcement learning algorithm. In [6], the authors use a recurrent policy and value function in a modified deep deterministic policy gradient algorithm to learn a policy for a robotic manipulator arm that uses real camera images as observations.…”
Section: Introductionmentioning
confidence: 99%
“…This limits most of these works to learning simple behaviours. Making policies robust for physics adaptation [36,32,51] is also receiving interest, but these methods haven't been shown to be powerful enough to work on real robots. Using bottlenecks [52] has been shown to help domain adaptation for simple tasks like reaching.…”
Section: Transfer From Simulation To the Real Worldmentioning
confidence: 99%
“…Most methods in this class try to infer the latent input using observations from the target environment. For example, Yu et al [26] conditioned the policy on the physics parameters of the robot, and trained a separate prediction model that estimates the physics parameters given the history of observations and actions. These methods can potentially adapt to changes in environments in an online fashion.…”
Section: B Adapting Control Policy To Novel Tasksmentioning
confidence: 99%