2020
DOI: 10.1609/aaai.v34i04.6003
|View full text |Cite
|
Sign up to set email alerts
|

On the Role of Weight Sharing During Deep Option Learning

Abstract: The options framework is a popular approach for building temporally extended actions in reinforcement learning. In particular, the option-critic architecture provides general purpose policy gradient theorems for learning actions from scratch that are extended in time. However, past work makes the key assumption that each of the components of option-critic has independent parameters. In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 12 publications
0
13
0
Order By: Relevance
“…the agent reaches the blue zone, it obtains a reward of +20 as opposed to a reward of +10 at the red-green junction. In Figure 1, we plot the rewards obtained per cycle for both the AR-RL agent and a DR-RL agent, and show that the hierarchical AR policy gradient performs better than its DR counterpart proposed by Riemer et al (2020). Finally, we illustrate the asymptotic convergence of the actor and critic parameters in Figure 2.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…the agent reaches the blue zone, it obtains a reward of +20 as opposed to a reward of +10 at the red-green junction. In Figure 1, we plot the rewards obtained per cycle for both the AR-RL agent and a DR-RL agent, and show that the hierarchical AR policy gradient performs better than its DR counterpart proposed by Riemer et al (2020). Finally, we illustrate the asymptotic convergence of the actor and critic parameters in Figure 2.…”
Section: Resultsmentioning
confidence: 99%
“…Finally, we look at the susceptibility of our framework to traps, and compare it to the DR setting proposed by Riemer et al (2020). Figure 3(b) depicts a grid world environment characterized by sparse rewards.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations