2020
DOI: 10.1109/tnnls.2019.2959129
|View full text |Cite
|
Sign up to set email alerts
|

Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
44
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 51 publications
(44 citation statements)
references
References 11 publications
0
44
0
Order By: Relevance
“…A novel DRL approach, combining TDD [34] and ND, is proposed to address the co-optimization problem. TDD-ND is a model-free, off-policy actor-critic algorithm, in which the triplet critics are used to limit estimation bias, and the exploration ND policy is used to improve the exploration in the algorithm.…”
Section: Proposed Triplet Deep Deterministic Policy Gradient With Exploration Noise Decay Approachmentioning
confidence: 99%
See 4 more Smart Citations
“…A novel DRL approach, combining TDD [34] and ND, is proposed to address the co-optimization problem. TDD-ND is a model-free, off-policy actor-critic algorithm, in which the triplet critics are used to limit estimation bias, and the exploration ND policy is used to improve the exploration in the algorithm.…”
Section: Proposed Triplet Deep Deterministic Policy Gradient With Exploration Noise Decay Approachmentioning
confidence: 99%
“…The TDD algorithm [34] is an off-line RL algorithm which can be applied to solve the optimization problem with continuous state space as well as continuous actions [35,36]. TDD includes a single actor network (i.e., a deterministic policy network) π φ and its actor target network π φ .…”
Section: Triplet Deep Deterministic Policy Gradient Algorithmmentioning
confidence: 99%
See 3 more Smart Citations