Dongfang Zhao scite author profile

A high required number of interactions with the environment is one of the most important problems in reinforcement learning (RL). To deal with this problem, several data-efficient RL algorithms have been proposed and successfully applied in practice. Unlike previous research, that focuses on optimal policy evaluation and policy improvement stages, we actively select informative samples by leveraging entropy-based optimal sampling strategy, which takes the initial samples set into consideration. During the initial sampling process, information entropy is used to describe the potential samples. The agent selects the most informative samples using an optimization method. This way, the initial sample is more informative than in random and fixed strategy. Therefore, a more accurate initial dynamic model and policy can be learned. Thus, the proposed optimal sampling method guides the agent to search in a more informative region. The experimental results on standard benchmark problems involving a pendulum, cart pole, and cart double pendulum show that our optimal sampling strategy has a better performance in terms of data efficiency. INDEX TERMS Reinforcement learning, information entropy, optimistic sampling, data efficiency.

show abstract

Data-Efficient Reinforcement Learning Using Active Exploration Method

Zhao

Liu

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dongfang Zhao

Video-based trajectory extraction with deep learning for High-Granularity Highway Simulation (HIGH-SIM)

Intelligent sensing based on active micro/nanomotors

Weight-categorized truck flow estimation: A data-fusion approach and a Florida case study

Optimistic Sampling Strategy for Data-Efficient Reinforcement Learning

Data-Efficient Reinforcement Learning Using Active Exploration Method

Contact Info

Product

Resources

About