Efficient Exploration Through Bayesian Deep Q-Networks

Azizzadenesheli, Kamyar; Brunskill, Emma; Anandkumar, Animashree

doi:10.1109/ita.2018.8503252

Cited by 84 publications

(73 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We envisage possible extensions of our approach using probabilistic reinforcement-learning methods including: Bayesian deep reinforcement learning [59][60][61] and model-based reinforcement learning 62,63 , where the goal is to estimate the uncertainty when making a decision and incorporate domain knowledge into the reinforcement-learning model. The resulting reinforcement-Fig.…”

Section: Discussionmentioning

confidence: 99%

Deep reinforcement learning for efficient measurement of quantum devices

et al. 2021

View full text Add to dashboard Cite

Deep reinforcement learning is an emerging machine-learning approach that can teach a computer to learn from their actions and rewards similar to the way humans learn from experience. It offers many advantages in automating decision processes to navigate large parameter spaces. This paper proposes an approach to the efficient measurement of quantum devices based on deep reinforcement learning. We focus on double quantum dot devices, demonstrating the fully automatic identification of specific transport features called bias triangles. Measurements targeting these features are difficult to automate, since bias triangles are found in otherwise featureless regions of the parameter space. Our algorithm identifies bias triangles in a mean time of <30 min, and sometimes as little as 1 min. This approach, based on dueling deep Q-networks, can be adapted to a broad range of devices and target transport features. This is a crucial demonstration of the utility of deep reinforcement learning for decision making in the measurement and operation of quantum devices.

show abstract

Section: Discussionmentioning

confidence: 99%

Deep reinforcement learning for efficient measurement of quantum devices

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Another alternative solution, by applying Bayesian deep Q-networks (BDQN) is an efficient Thompson sampling based method in high dimensional RL problems. In [8] 132 | P a g e www.ijacsa.thesai.org Azizzadenesheli and Anandkumar studied the behaviour of BDQN and compared it to another method to solve exploration -exploitation trade off. Yet the problem is this method itself is difficult in implementing and time consuming and did not provide a sample efficiency guarantee.…”

Section: Related Workmentioning

confidence: 99%

Smart Start and HER for a Directed and Persistent Reinforcement Learning Exploration in Discrete Environment

Alrakh¹,

Fahmi²,

Nor³

2020

IJACSA

View full text Add to dashboard Cite

Reinforcement learning (RL) solves sequential decision making problems through trial and error, through experiences can be amassed to achieve goals and increase the accumulative rewards. Exploration-exploitation dilemma is a critical challenge in reinforcement learning, particularly environments with misleading or sparse rewards which have shown difficulties to construct a suitable exploration strategy. In this paper a framework for Smart Start (SS) and Hindsight experience replay (HER) is developed to improve the performance of SS and make the exploration more directed especially in the early episodes. The framework Smart Start and Hindsight experience replay (SS+HER) was studied in discrete maze environment with sparse rewards. The results reveal that the framework doubles the rewards at the early episodes and decreases the time of the agent to reach the goal.

show abstract

“…These algorithms result from an effort to incorporate Bayesian computations into the deep RL framework, and correspond to a very active trend in the field. Most of these works address discrete actions (Azizzadenesheli et al, 2018;Tang and Kucukelbir, 2017), but d4pg is an exception that derives from adopting a distributional perspective on policy gradient computation, resulting in more accurate estimates on the gradient and better sample efficiency (Bellemare et al, 2017).…”

Section: Overview Of Deep Rl Algorithmsmentioning

confidence: 99%

Policy search in continuous action domains: An overview

Sigaud

Stulp

2019

Neural Networks

View full text Add to dashboard Cite

Continuous action policy search is currently the focus of intensive research, driven both by the recent success of deep reinforcement learning algorithms and the emergence of competitors based on evolutionary algorithms. In this paper, we present a broad survey of policy search methods, providing a unified perspective on very different approaches, including also Bayesian Optimization and directed exploration methods. The main message of this overview is in the relationship between the families of methods, but we also outline some factors underlying sample efficiency properties of the various approaches.In the context of robotics, sample efficiency is a key concern. There are three aspects to sample ef-arXiv:1803.04706v5 [cs.LG]

show abstract

Efficient Exploration Through Bayesian Deep Q-Networks

Cited by 84 publications

References 26 publications

Deep reinforcement learning for efficient measurement of quantum devices

Deep reinforcement learning for efficient measurement of quantum devices

Smart Start and HER for a Directed and Persistent Reinforcement Learning Exploration in Discrete Environment

Policy search in continuous action domains: An overview

Contact Info

Product

Resources

About