2022
DOI: 10.48550/arxiv.2202.12504
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Consolidated Adaptive T-soft Update for Deep Reinforcement Learning

Abstract: Demand for deep reinforcement learning (DRL) is gradually increased to enable robots to perform complex tasks, while DRL is known to be unstable. As a technique to stabilize its learning, a target network that slowly and asymptotically matches a main network is widely employed to generate stable pseudo-supervised signals. Recently, T-soft update has been proposed as a noise-robust update rule for the target network and has contributed to improving the DRL performance. However, the noise robustness of T-soft up… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(12 citation statements)
references
References 17 publications
(30 reference statements)
0
0
0
Order By: Relevance
“…Initially, the "hard" update strategy of copying the main Q-network to the targetnetwork after a certain period of time was used [5]. Since then, the "soft" update strategy has been used by interpolating the parameters of the target-network using a fixed ratio between the current parameters of the target-network and the parameters of the main Q-network [10][11][12][13][14].…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…Initially, the "hard" update strategy of copying the main Q-network to the targetnetwork after a certain period of time was used [5]. Since then, the "soft" update strategy has been used by interpolating the parameters of the target-network using a fixed ratio between the current parameters of the target-network and the parameters of the main Q-network [10][11][12][13][14].…”
Section: Introductionmentioning
confidence: 99%
“…However, these methods exhibit several limitations. In addition to their slow learning speed, they are sensitive to noise and outliers in parameter updates for the main Q-network [11][12][13]. A simple solution to this problem is to reduce the copy size; however, the learning speed is considerably slower.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations