2019
DOI: 10.48550/arxiv.1909.01500
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch

Abstract: Since the recent advent of deep reinforcement learning for game play [1] and simulated robotic control (e.g. [2]), a multitude of new algorithms have flourished. Most are model-free algorithms which can be categorized into three families: deep Q-learning, policy gradients, and Q-value policy gradients. These have developed along separate lines of research, such that few, if any, code bases incorporate all three kinds. Yet these algorithms share a great depth of common deep reinforcement learning machinery. We … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 34 publications
(37 citation statements)
references
References 15 publications
0
37
0
Order By: Relevance
“…For all approaches, including baselines, we run at least 15 random seeds for 6 different learning rates, {0.001, 0.003, 0.0001, 0.0003, 0.00001, 0.00003}, and report the best learning rate for each. Other hyperparameters are taken as default in the codebase (Stooke & Abbeel, 2019;van der Pol et al, 2020).…”
Section: Methodsmentioning
confidence: 99%
“…For all approaches, including baselines, we run at least 15 random seeds for 6 different learning rates, {0.001, 0.003, 0.0001, 0.0003, 0.00001, 0.00003}, and report the best learning rate for each. Other hyperparameters are taken as default in the codebase (Stooke & Abbeel, 2019;van der Pol et al, 2020).…”
Section: Methodsmentioning
confidence: 99%
“…While H2.0 is a fast simulator, we find that the performance of the overall simulation+training loop is bottlenecked by factors like synchronization of parallel environments and reloading of assets upon episode reset. An exciting and complementary future direction is holistically reorganizing the rendering+physics+RL interplay as studied by [75][76][77][78][79][80]. Concretely, as illustrated in Figure 3, there is idle GPU time when rendering is faster than physics, because inference waits for both o t and s t+1 to be ready despite not needing s t+1 .…”
Section: Societal Impacts Limitations and Conclusionmentioning
confidence: 99%
“…• rlpyt is an open-source repository of modular and parallelised implementations of various deep reinforcement-learning algorithms [569].…”
Section: Literature Walkthroughmentioning
confidence: 99%

Social physics

Jusup,
Holme,
Kanazawa
et al. 2021
Preprint