2020
DOI: 10.48550/arxiv.2007.06159
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Implicit Distributional Reinforcement Learning

Abstract: To improve the sample efficiency of policy-gradient based reinforcement learning algorithms, we propose implicit distributional actor critic (IDAC) that consists of a distributional critic, built on two deep generator networks (DGNs), and a semi-implicit actor (SIA), powered by a flexible policy distribution. We adopt a distributional perspective on the discounted cumulative return and model it with a state-action-dependent implicit distribution, which is approximated by the DGNs that take state-action pairs a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 36 publications
0
1
0
Order By: Relevance
“…Our resulting algorithm, E2DC, shows performance gains in difficult continuous control tasks and improvements in distribution matching. Future work includes integrating our changes with orthogonal advances in distributional RL (Kuznetsov et al, 2020) and more expressive policies (Yue et al, 2020;Ward et al, 2019). Due to our method potentially learning more accurate distributions of the true returns, our work here can be leveraged for specific use cases, such as risk-seeking policies in stock market trading strategies, or risk-averse learning for robotics.…”
Section: Discussionmentioning
confidence: 99%
“…Our resulting algorithm, E2DC, shows performance gains in difficult continuous control tasks and improvements in distribution matching. Future work includes integrating our changes with orthogonal advances in distributional RL (Kuznetsov et al, 2020) and more expressive policies (Yue et al, 2020;Ward et al, 2019). Due to our method potentially learning more accurate distributions of the true returns, our work here can be leveraged for specific use cases, such as risk-seeking policies in stock market trading strategies, or risk-averse learning for robotics.…”
Section: Discussionmentioning
confidence: 99%