2022
DOI: 10.48550/arxiv.2204.07137
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accelerated Policy Learning with Parallel Differentiable Simulation

Abstract: Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 15 publications
0
9
0
Order By: Relevance
“…The proposed tasks enabled us to study manipulation problems with fluids comprehensively, and shed light on challenges of existing methods, and the potential of utilizing differentiable physics as an effective tool for trajectory optimization in the context of complex fluid manipulation. One future direction is to extend towards more realistic problem setups using visual input, and to distill policy optimized using differentiable physics into neural-network based policies (Lin et al, 2022a;Xu et al, 2023), or use gradients provided by differentiable simulation to guide policy learning (Xu et al, 2022). Furthermore, since MPM was originally proposed, researchers have been using it to simulate various realistic materials with intricate properties, such as snow, sand, mud, and granular materials.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The proposed tasks enabled us to study manipulation problems with fluids comprehensively, and shed light on challenges of existing methods, and the potential of utilizing differentiable physics as an effective tool for trajectory optimization in the context of complex fluid manipulation. One future direction is to extend towards more realistic problem setups using visual input, and to distill policy optimized using differentiable physics into neural-network based policies (Lin et al, 2022a;Xu et al, 2023), or use gradients provided by differentiable simulation to guide policy learning (Xu et al, 2022). Furthermore, since MPM was originally proposed, researchers have been using it to simulate various realistic materials with intricate properties, such as snow, sand, mud, and granular materials.…”
Section: Discussionmentioning
confidence: 99%
“…Another approach is to implement physics simulations in a differentiable manner, usually with automatic differentiation tools (Hu et al, 2019a;de Avila Belbute-Peres et al, 2018;Xu et al, 2022;Huang et al, 2021), and the gradient information provided by these simulations has shown to be helpful in control and manipulation tasks concerning rigid and soft bodies (Xu et al, 2021;Lin et al, 2022a;Xu et al, 2022;Li et al, 2022;Wang et al, 2023). On a related note, Taichi (Hu et al, 2019b) and Nvidia Warp (Macklin, 2022) are two recently proposed domain-specific programming language for GPU-accelerated simulation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This paper delves into the model-based setting (Chua et al, 2018;Janner et al, 2019;Kaiser et al, 2019;Zhang, 2022;Hafner et al, 2023), where a learned model is employed to train a control policy. Recent approaches (Mora et al, 2021;Suh et al, 2022a,b;Xu et al, 2022) based on differentiable simulators (Freeman et al, 2021;Heiden et al, 2021b) assume that gradients of simulation outcomes w.r.t. actions are explicitly given.…”
Section: Related Workmentioning
confidence: 99%
“…When the model is good enough, a small h may not fully leverage the accurate gradient information. As evidence, approaches (Xu et al, 2022;Mora et al, 2021) based on differentiable simulators typically adopt longer unrolls compared to model-based approaches. Therefore, with SN, more accurate multi-step predictions should enable more efficient learning without making the underlying optimization process harder.…”
Section: Benefit Of Smoothness Regularizationmentioning
confidence: 99%