Rousslan Fernand Julien Dossa scite author profile

Rousslan Fernand Julien Dossa

5Publications

8Citation Statements Received

11Citation Statements Given

How they've been cited

How they cite others

Affiliations

Kobe University

Publications

Order By: Most citations

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

Huang¹,

Dossa²,

Ye³

et al. 2021

Preprint

View full text Add to dashboard Cite

CleanRL is an open-source library that provides high-quality single-file implementations of Deep Reinforcement Learning algorithms. It provides a simpler yet scalable developing experience by having a straightforward codebase and integrating production tools to help interact and scale experiments. In CleanRL, we put all details of an algorithm into a single file, making these performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, videos of an agent's gameplay, dependencies, and more to the cloud. Despite succinct implementations, we have also designed tools to help scale, at one point orchestrating experiments on more than 2000 machines simultaneously via Docker and cloud providers. Finally, we have ensured the quality of the implementations by benchmarking against a variety of environments.

show abstract

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

et al. 2021

View full text Add to dashboard Cite

Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as "early stopping" implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch (K), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) mitigate such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on K.

show abstract

Hybrid of Reinforcement and Imitation Learning for Human-Like Agents

Dossa

Lian

Nomoto³

et al. 2020

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

A2C is a special case of PPO

Huang¹,

Kanervisto²,

Raffin³

et al. 2022

Preprint

View full text Add to dashboard Cite

A Human-Like Agent Based on a Hybrid of Reinforcement and Imitation Learning

Dossa

Lian

Nomoto³

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.