Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/445
|View full text |Cite
|
Sign up to set email alerts
|

Hill Climbing on Value Estimates for Search-control in Dyna

Abstract: Dyna is an architecture for model-based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search-control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(25 citation statements)
references
References 1 publication
0
24
0
Order By: Relevance
“…We present the Dyna architecture with the frequency-based search-control (Algorithm 1) in this section. It combines the idea that samples from high-frequency regions of the state space is important, as discussed in the previous section, and the hill climbing process to effectively draw samples from those regions, as introduced by Pan et al (2019). We omit implementation details such as preconditioning, noisy gradient for the hill climbing process, and refer readers to Appendix A.6 and A.7.…”
Section: Frequency-based Search-control In Dynamentioning
confidence: 99%
See 4 more Smart Citations
“…We present the Dyna architecture with the frequency-based search-control (Algorithm 1) in this section. It combines the idea that samples from high-frequency regions of the state space is important, as discussed in the previous section, and the hill climbing process to effectively draw samples from those regions, as introduced by Pan et al (2019). We omit implementation details such as preconditioning, noisy gradient for the hill climbing process, and refer readers to Appendix A.6 and A.7.…”
Section: Frequency-based Search-control In Dynamentioning
confidence: 99%
“…Gu et al (2016) utilizes local linear models to generate optimal trajectories through iLQR (Li & Todorov, 2004). Pan et al (2019) suggest a method to generate states for the searchcontrol queue by hill climbing on the value function estimate. This paper proposes an alternative perspective to design search-control strategy: we can sample more frequently from the state space where the value function is more difficult to estimate.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations