2021
DOI: 10.48550/arxiv.2105.12991
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization

Taisuke Kobayashi

Abstract: This paper addresses a new interpretation of reinforcement learning (RL) as reverse Kullback-Leibler (KL) divergence optimization, and derives a new optimization method using forward KL divergence. Although RL originally aims to maximize return indirectly through optimization of policy, the recent work by Levine has proposed a different derivation process with explicit consideration of optimality as stochastic variable. This paper follows this concept and formulates the traditional learning laws for both value… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 27 publications
0
1
0
Order By: Relevance
“…position) are unobserved, making this task a partially observed MDP (POMDP). Note that the default setting for Minitaur tasks is unrealistic, as pointed out in the literature [25]. Therefore, it was modified as shown in Table III (arguments not listed are left at default).…”
Section: Demonstrationmentioning
confidence: 99%
“…position) are unobserved, making this task a partially observed MDP (POMDP). Note that the default setting for Minitaur tasks is unrealistic, as pointed out in the literature [25]. Therefore, it was modified as shown in Table III (arguments not listed are left at default).…”
Section: Demonstrationmentioning
confidence: 99%