2019
DOI: 10.48550/arxiv.1906.02771
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…Dinh et al (2015;; Kingma & Dhariwal (2018) propose the coupling method to make the Jacobian triangular and ensure the forward and inverse can be computed with a single pass. The applications of NF include image generation (Ho et al, 2019;Kingma & Dhariwal, 2018), video generation (Kumar et al, 2019) and reinforcement learning (Mazoure et al, 2020;Ward et al, 2019;Touati et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Dinh et al (2015;; Kingma & Dhariwal (2018) propose the coupling method to make the Jacobian triangular and ensure the forward and inverse can be computed with a single pass. The applications of NF include image generation (Ho et al, 2019;Kingma & Dhariwal, 2018), video generation (Kumar et al, 2019) and reinforcement learning (Mazoure et al, 2020;Ward et al, 2019;Touati et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…al. [43] altered the choice of policy distribution from factored Gaussian in vanilla SAC to Normalizing flow policies for improving exploration. Campo et.…”
Section: Preliminaries and Motivationmentioning
confidence: 99%
“…In Haarnoja et al (2018a), SAC is proposed to mitigate the policy's expressiveness issue while retaining tractable optimization; with the policy modeled with either a Gaussian or a mixture of Gaussian, SAC adopts a maximum entropy RL objective function to encourage exploration. The normalizing flow (Rezende and Mohamed, 2015;Dinh et al, 2016) based techniques have been recently applied to design a flexible policy in both on-policy (Tang and Agrawal, 2018) and off-policy settings (Ward et al, 2019).…”
Section: Related Workmentioning
confidence: 99%