2018
DOI: 10.1109/tnnls.2018.2806087
|View full text |Cite
|
Sign up to set email alerts
|

Multisource Transfer Double DQN Based on Actor Learning

Abstract: Deep reinforcement learning (RL) comprehensively uses the psychological mechanisms of "trial and error" and "reward and punishment" in RL as well as powerful feature expression and nonlinear mapping in deep learning. Currently, it plays an essential role in the fields of artificial intelligence and machine learning. Since an RL agent needs to constantly interact with its surroundings, the deep Q network (DQN) is inevitably faced with the need to learn numerous network parameters, which results in low learning … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 82 publications
(20 citation statements)
references
References 23 publications
0
20
0
Order By: Relevance
“…In this section, the network structure consisting of three convolutional layers and one fully connected layer is proposed [16]. e CNN framework is shown in Figure 4.…”
Section: Journal Of Roboticsmentioning
confidence: 99%
“…In this section, the network structure consisting of three convolutional layers and one fully connected layer is proposed [16]. e CNN framework is shown in Figure 4.…”
Section: Journal Of Roboticsmentioning
confidence: 99%
“…If the overestimation continues to occur during training, the policy update will be negatively affected. These features have led to the emergence of techniques called Double Q-Learning and Double DQN, which use two value networks to separate action selection updates and Q-value updates [23]. TD3 (Twin Delayed Deep Deterministic Policy Gradient) is an algorithm that applies several techniques to the DDPG for preventing overestimation on the value function.…”
Section: Td3mentioning
confidence: 99%
“…However, DQN would overestimate the action-state value function during the training process, which affects the final decision and fails to obtain the optimal strategy. To solve this problem, many scholars put forward improved algorithms such as double DQN, 30 dueling DQN, 31 and rainbow DQN. 32 As far as we know, the double DQN plays a role in tuning optimization of the parameters for LADRC in this article for the first time.…”
Section: Introductionmentioning
confidence: 99%