Hill Climbing on Value Estimates for Search-control in Dyna

Pan, Yangchen; Yao, Hengshuai; Farahmand, Amir-massoud; White, Martha

doi:10.24963/ijcai.2019/445

Cited by 5 publications

(25 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We present the Dyna architecture with the frequency-based search-control (Algorithm 1) in this section. It combines the idea that samples from high-frequency regions of the state space is important, as discussed in the previous section, and the hill climbing process to effectively draw samples from those regions, as introduced by Pan et al (2019). We omit implementation details such as preconditioning, noisy gradient for the hill climbing process, and refer readers to Appendix A.6 and A.7.…”

Section: Frequency-based Search-control In Dynamentioning

confidence: 99%

“…Gu et al (2016) utilizes local linear models to generate optimal trajectories through iLQR (Li & Todorov, 2004). Pan et al (2019) suggest a method to generate states for the searchcontrol queue by hill climbing on the value function estimate. This paper proposes an alternative perspective to design search-control strategy: we can sample more frequently from the state space where the value function is more difficult to estimate.…”

Section: Introductionmentioning

confidence: 99%

“…We then propose a method to locally measure the frequency of a point in a function's domain and provide a theoretical justification for our method (Theorem 1 in Section 3.2). We use the hill climbing approach of Pan et al (2019) to adapt our method to design a search-control mechanism for the Dyna architecture (Section 4). We conduct experiments on benchmark and challenging domains to illustrate the properties and utilities of our method (Section 5).…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…The most relevant work to ours is hill climbing Dyna (Pan et al, 2019). Pan et al (2019) proposes a search-control mechanism based on hill climbing on the value estimates (see Algorithm 3 in Appendix A.3). We briefly review the key steps of their algorithm, which is called (Hill Climbing)HC-Dyna, as it helps to understand ours.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Frequency-based Search-control in Dyna

Pan

Mei

Farahmand³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Model-based reinforcement learning has been empirically demonstrated as a successful strategy to improve sample efficiency. In particular, Dyna is an elegant model-based architecture integrating learning and planning that provides huge flexibility of using a model. One of the most important components in Dyna is called search-control, which refers to the process of generating state or state-action pairs from which we query the model to acquire simulated experiences. Searchcontrol is critical in improving learning efficiency. In this work, we propose a simple and novel search-control strategy by searching high frequency regions of the value function. Our main intuition is built on Shannon sampling theorem from signal processing, which indicates that a high frequency signal requires more samples to reconstruct. We empirically show that a high frequency function is more difficult to approximate. This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples. We develop a simple strategy to locally measure the frequency of a function by gradient and hessian norms, and provide theoretical justification for this approach. We then apply our strategy to search-control in Dyna, and conduct experiments to show its property and effectiveness on benchmark domains.

show abstract

Section: Frequency-based Search-control In Dynamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Frequency-based Search-control in Dyna

Pan

Mei

Farahmand³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

A learning search algorithm with propagational reinforcement learning

Zhang

2021

Appl Intell

View full text Add to dashboard Cite

Prioritized experience replay based on dynamics priority

Li,

Qian,

Song

2024

Sci Rep

View full text Add to dashboard Cite

Experience replay has been instrumental in achieving significant advancements in reinforcement learning by increasing the utilization of data. To further improve the sampling efficiency, prioritized experience replay (PER) was proposed. This algorithm prioritizes experiences based on the temporal difference error (TD error), enabling the agent to learn from more valuable experiences stored in the experience pool. While various prioritized algorithms have been proposed, they ignored the dynamic changes of experience value during the training process, merely combining different priority criteria in a fixed or linear manner. In this paper, we present a novel prioritized experience replay algorithm called PERDP, which employs a dynamic priority adjustment framework. PERDP adaptively adjusts the weights of each criterion based on average priority level of the experience pool and evaluates experiences’ value according to current network. We apply this algorithm to the SAC model and conduct experiments in the OpenAI Gym experimental environment. The experiment results demonstrate that the PERDP exhibits superior convergence speed when compared to the PER.

show abstract

Hill Climbing on Value Estimates for Search-control in Dyna

Cited by 5 publications

References 1 publication

Frequency-based Search-control in Dyna

Frequency-based Search-control in Dyna

A learning search algorithm with propagational reinforcement learning

Prioritized experience replay based on dynamics priority

Contact Info

Product

Resources

About