2018
DOI: 10.48550/arxiv.1809.04474
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-task Deep Reinforcement Learning with PopArt

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 0 publications
0
17
0
Order By: Relevance
“…The AWAC codebase considers several alternatives, including softmax normalization. An interesting alternative is PopArt [19,21]; by standardizing the output of our critic networks we rescale advantages and get the benefits of PopArt's stability and hyperparameter insensitivity for free. The second challenge is the temperature hyperparameter β.…”
Section: Binary Vs Exponential Filtersmentioning
confidence: 99%
See 1 more Smart Citation
“…The AWAC codebase considers several alternatives, including softmax normalization. An interesting alternative is PopArt [19,21]; by standardizing the output of our critic networks we rescale advantages and get the benefits of PopArt's stability and hyperparameter insensitivity for free. The second challenge is the temperature hyperparameter β.…”
Section: Binary Vs Exponential Filtersmentioning
confidence: 99%
“…PopArt is implemented as described in [19] and [21]. We use an adaptive step size when computing the normalization statistics in order to reduce reliance on initialization.…”
Section: B2 Popartmentioning
confidence: 99%
“…Adaptive normalization using Pop-Art: In our preliminary experiments we observed that DSE-REINFORCE was selectively solving some tasks but not others. For this reason we use the adaptive rescaling method Pop-Art [30,31] to normalize the discounted rewards Rt (τ i,j m ) to have zero mean and unit variance before each training iteration. Thus all tasks affect the gradient equally.…”
Section: Dse-reinforcementioning
confidence: 99%
“…The IMPALA architecture [6] can scale training of actor-critic methods across many machines to achieve a high troughput, enabling advances in multi-task RL [10]. This is achieved by a combination of algorithmic and engineering advances.…”
Section: Related Workmentioning
confidence: 99%