2018
DOI: 10.48550/arxiv.1805.12114
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
118
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 50 publications
(118 citation statements)
references
References 0 publications
0
118
0
Order By: Relevance
“…State-based environments: We model the environment transition function using an neural ensemble of size N , where network's output neurons parameterize a Gaussian distribution T = N (µ(s t , a t ), Σ(s t , a t ) (Chua et al 2018).…”
Section: E2 Uncertainty Estimatorsmentioning
confidence: 99%
“…State-based environments: We model the environment transition function using an neural ensemble of size N , where network's output neurons parameterize a Gaussian distribution T = N (µ(s t , a t ), Σ(s t , a t ) (Chua et al 2018).…”
Section: E2 Uncertainty Estimatorsmentioning
confidence: 99%
“…Model-based RL approaches typically alternate between fitting a predictive model of the environment dynamics/rewards and updating the control policies. The model can be used in various ways, such as execution-time planning [5,21], generating imaginary experiences for training the control policy [12,32]), etc. Our work is inspired by [7], which addresses the problem of error in long-horizon model dynamics prediction.…”
Section: Related Workmentioning
confidence: 99%
“…On the one hand stands the use of model predictive control in the engineering community where finely specified dynamics models are constructed by engineers and only a small number of parameters are fit with system identification to determine mass, inertia, joint stiffness, etc. On the other side of things stands the hands off approach taken in the RL community, where general and unstructured neural networks are used for both transition models [9,55,25] as well as policies and value functions [20]. The state and action spaces for these systems are highly complex with many diverse inputs like quaternions, joint angles, forces, torques that each transform in different ways under a symmetry transformation like a left-right reflection or a rotation.…”
Section: Approximate Symmetries In Reinforcement Learningmentioning
confidence: 99%
“…State of the art model based approaches on Mujoco tend to use an ensemble of small MLPs that predict the state transitions [9,55,25,2], without exploiting any structure of the state space. We evaluate test rollout predictions via the relative error of the state over different length horizons for the RPP model against an MLP, the method of choice.…”
Section: Better Transitionmentioning
confidence: 99%