2018 Information Theory and Applications Workshop (ITA) 2018
DOI: 10.1109/ita.2018.8503252
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Exploration Through Bayesian Deep Q-Networks

Abstract: We study reinforcement learning (RL) in high dimensional episodic Markov decision processes (MDP). We consider value-based RL when the optimal Q-value is a linear function of d-dimensional state-action feature representation. For instance, in deep-Q networks (DQN), the Q-value is a linear function of the feature representation layer (output layer). We propose two algorithms, one based on optimism, LINUCB, and another based on posterior sampling, LINPSRL. We guarantee frequentist and Bayesian regret upper bound… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
71
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 84 publications
(73 citation statements)
references
References 26 publications
0
71
2
Order By: Relevance
“…We envisage possible extensions of our approach using probabilistic reinforcement-learning methods including: Bayesian deep reinforcement learning [59][60][61] and model-based reinforcement learning 62,63 , where the goal is to estimate the uncertainty when making a decision and incorporate domain knowledge into the reinforcement-learning model. The resulting reinforcement-Fig.…”
Section: Discussionmentioning
confidence: 99%
“…We envisage possible extensions of our approach using probabilistic reinforcement-learning methods including: Bayesian deep reinforcement learning [59][60][61] and model-based reinforcement learning 62,63 , where the goal is to estimate the uncertainty when making a decision and incorporate domain knowledge into the reinforcement-learning model. The resulting reinforcement-Fig.…”
Section: Discussionmentioning
confidence: 99%
“…Another alternative solution, by applying Bayesian deep Q-networks (BDQN) is an efficient Thompson sampling based method in high dimensional RL problems. In [8] 132 | P a g e www.ijacsa.thesai.org Azizzadenesheli and Anandkumar studied the behaviour of BDQN and compared it to another method to solve exploration -exploitation trade off. Yet the problem is this method itself is difficult in implementing and time consuming and did not provide a sample efficiency guarantee.…”
Section: Related Workmentioning
confidence: 99%
“…These algorithms result from an effort to incorporate Bayesian computations into the deep RL framework, and correspond to a very active trend in the field. Most of these works address discrete actions (Azizzadenesheli et al, 2018;Tang and Kucukelbir, 2017), but d4pg is an exception that derives from adopting a distributional perspective on policy gradient computation, resulting in more accurate estimates on the gradient and better sample efficiency (Bellemare et al, 2017).…”
Section: Overview Of Deep Rl Algorithmsmentioning
confidence: 99%