Multi-task Deep Reinforcement Learning with PopArt

Hessel, Matteo; Soyer, Hubert; Espeholt, Lasse; Czarnecki, Wojciech Marian; Schmitt, Simon; Hasselt, Hado van

doi:10.48550/arxiv.1809.04474

Cited by 16 publications

(18 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The AWAC codebase considers several alternatives, including softmax normalization. An interesting alternative is PopArt [19,21]; by standardizing the output of our critic networks we rescale advantages and get the benefits of PopArt's stability and hyperparameter insensitivity for free. The second challenge is the temperature hyperparameter β.…”

Section: Binary Vs Exponential Filtersmentioning

confidence: 99%

See 1 more Smart Citation

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Grigsby,

2021

Preprint

View full text Add to dashboard Cite

Recent Offline Reinforcement Learning methods have succeeded in learning highperformance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making strategies. Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise. A thorough investigation on a custom benchmark helps identify several key challenges involved in learning from high-noise datasets. We re-purpose prioritized experience sampling to locate expert-level demonstrations among millions of low-performance samples. This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65 : 1.

show abstract

Section: Binary Vs Exponential Filtersmentioning

confidence: 99%

“…PopArt is implemented as described in [19] and [21]. We use an adaptive step size when computing the normalization statistics in order to reduce reliance on initialization.…”

Section: B2 Popartmentioning

confidence: 99%

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Grigsby,

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Adaptive normalization using Pop-Art: In our preliminary experiments we observed that DSE-REINFORCE was selectively solving some tasks but not others. For this reason we use the adaptive rescaling method Pop-Art [30,31] to normalize the discounted rewards Rt (τ i,j m ) to have zero mean and unit variance before each training iteration. Thus all tasks affect the gradient equally.…”

Section: Dse-reinforcementioning

confidence: 99%

Disentangled Skill Embeddings for Reinforcement Learning

Petangoda¹,

Pascual-Diaz²,

Adam³

et al. 2019

Preprint

View full text Add to dashboard Cite

We propose a novel framework for multi-task reinforcement learning (MTRL). Using a variational inference formulation, we learn policies that generalize across both changing dynamics and goals. The resulting policies are parametrized by shared parameters that allow for transfer between different dynamics and goal conditions, and by task-specific latent-space embeddings that allow for specialization to particular tasks. We show how the latent-spaces enable generalization to unseen dynamics and goals conditions. Additionally, policies equipped with such embeddings serve as a space of skills (or options) for hierarchical reinforcement learning. Since we can change task dynamics and goals independently, we name our framework Disentangled Skill Embeddings (DSE). * Research conducted while doing a research placement at PROWLER.io. † Equal contribution.Preprint. Under review.

show abstract

“…The IMPALA architecture [6] can scale training of actor-critic methods across many machines to achieve a high troughput, enabling advances in multi-task RL [10]. This is achieved by a combination of algorithmic and engineering advances.…”

Section: Related Workmentioning

confidence: 99%

Importance Weighted Evolution Strategies

Campos,

Giro-i-Nieto,

Torres

2018

Preprint

View full text Add to dashboard Cite

Evolution Strategies (ES) emerged as a scalable alternative to popular Reinforcement Learning (RL) techniques, providing an almost perfect speedup when distributed across hundreds of CPU cores thanks to a reduced communication overhead. Despite providing large improvements in wall-clock time, ES is data inefficient when compared to competing RL methods. One of the main causes of such inefficiency is the collection of large batches of experience, which are discarded after each policy update. In this work, we study how to perform more than one update per batch of experience by means of Importance Sampling while preserving the scalability of the original method. The proposed method, Importance Weighted Evolution Strategies (IW-ES), shows promising results and is a first step towards designing efficient ES algorithms.

show abstract

Multi-task Deep Reinforcement Learning with PopArt

Cited by 16 publications

References 0 publications

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Disentangled Skill Embeddings for Reinforcement Learning

Importance Weighted Evolution Strategies

Contact Info

Product

Resources

About