2016
DOI: 10.1609/aaai.v30i1.10295
|View full text |Cite
|
Sign up to set email alerts
|

Deep Reinforcement Learning with Double Q-Learning

Abstract: The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
878
0
3

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 2,865 publications
(924 citation statements)
references
References 8 publications
2
878
0
3
Order By: Relevance
“…Moreover, |Ac| is the size of the action space and . To make our RL agent more robust for more stable learning and to handle the problem of the overestimation of Q-values, the double Q-network [ 46 ] and fixed Q-targets [ 47 ] were also incorporated: where TD is the temporal difference; and is another dueling DQN network, as the target network and its parameters ( ) were fixed and copied from the dueling DQN every m step ( m = 20). To update the parameters ( ) from the dueling DQN as shown in Figure 7 , we trained our RL agent by minimizing the loss function: where E is the expectation.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, |Ac| is the size of the action space and . To make our RL agent more robust for more stable learning and to handle the problem of the overestimation of Q-values, the double Q-network [ 46 ] and fixed Q-targets [ 47 ] were also incorporated: where TD is the temporal difference; and is another dueling DQN network, as the target network and its parameters ( ) were fixed and copied from the dueling DQN every m step ( m = 20). To update the parameters ( ) from the dueling DQN as shown in Figure 7 , we trained our RL agent by minimizing the loss function: where E is the expectation.…”
Section: Methodsmentioning
confidence: 99%
“…To combine FBDD with our RL framework, we first collected and built a SARS-CoV-2 3CL pro inhibitor dataset containing 284 reported molecules. We adopted the improved BRICS algorithm [ 46 ] to split these molecules to obtain the fragment library target on SARS-CoV-2 3CL pro , as demonstrated in the flowchart in Figure 1 (yellow box). An elaborate filtering cascade is accompanied by manual inspection, and the rules can be changed based on the needs of different studies.…”
Section: Methodsmentioning
confidence: 99%
“…A deep Q network approximates Q -function with a neural network. In this article, we particularly used double DQN ( Van Hasselt et al, 2016 ), where a target network is used to find the loss between the current and desired prediction of the Q values. This loss is then used to update the weights of the neural network representing the agent.…”
Section: Knowledge-guided Reinforcement Learningmentioning
confidence: 99%
“…RL usually consists of value-based methods and policy-based methods. The valued-based methods approximate value functions with tabular charts and neural networks, typically like DQN [ 17 ], Dueling-DQN [ 18 ], and Double DQN [ 19 ]. The value-based Q-learning will explode when the space dimension increases.…”
Section: Introductionmentioning
confidence: 99%