2019
DOI: 10.3389/fnbot.2019.00103
|View full text |Cite
|
Sign up to set email alerts
|

Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning

Abstract: A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
40
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 44 publications
(42 citation statements)
references
References 15 publications
(20 reference statements)
0
40
0
2
Order By: Relevance
“…Q-learning, as a very classical algorithm in RL, is a good example to understand the purpose of DRL. The big issue with Q-learning falls into the tabular method, which means that when state and action spaces are very large, it cannot build a very large Q table to store a large number of Q values [35] . Besides, it counts and iterates Q values based on past states.…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
“…Q-learning, as a very classical algorithm in RL, is a good example to understand the purpose of DRL. The big issue with Q-learning falls into the tabular method, which means that when state and action spaces are very large, it cannot build a very large Q table to store a large number of Q values [35] . Besides, it counts and iterates Q values based on past states.…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
“…The Deep Q-Learning Network (DQN) is a way of modeling the environment and calculating the collision energy function, which is the main cause of a loss in functionality (Ohnishi et al, 2019 ). To realize the path planning process, the neural network is trained to minimize the loss function through the gradient descent method.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, the study by Pohlen et al [15] was considered to alleviate the instability of the learning process. Ohnishi et al [16] proposed constrained DQN to behave in two different methods: when the difference between the maximum value of the Q-function and the value of the target network is large, constrained DQN updates the Q-function more conservatively, and when this difference is small, constrained DQN behaves similar to that of conventional standard Q-learning. Studies [14][15][16] provide a family of target-based TD-learning algorithms [17].…”
mentioning
confidence: 99%
“…Ohnishi et al [16] proposed constrained DQN to behave in two different methods: when the difference between the maximum value of the Q-function and the value of the target network is large, constrained DQN updates the Q-function more conservatively, and when this difference is small, constrained DQN behaves similar to that of conventional standard Q-learning. Studies [14][15][16] provide a family of target-based TD-learning algorithms [17]. Study [17] showed that the success of deep Q-learning is indispensable to use a separate target network to improve the performance of Q-learning, and provided insight into the theoretical approaches, and introduced three different update methods: averaging TD, double TD, and periodic TD, where the target network is updated in an averaging, symmetric, or periodic manner, respectively.…”
mentioning
confidence: 99%
See 1 more Smart Citation