2023
DOI: 10.1049/cit2.12202
|View full text |Cite
|
Sign up to set email alerts
|

Deep reinforcement learning using least‐squares truncated temporal‐difference

Abstract: Policy evaluation (PE) is a critical sub‐problem in reinforcement learning, which estimates the value function for a given policy and can be used for policy improvement. However, there still exist some limitations in current PE methods, such as low sample efficiency and local convergence, especially on complex tasks. In this study, a novel PE algorithm called Least‐Squares Truncated Temporal‐Difference learning (LST2D) is proposed. In LST2D, an adaptive truncation mechanism is designed, which effectively takes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 35 publications
0
1
0
Order By: Relevance
“…This conventional observer based method will reduce the following performance in the complex traffic environment. At the same time, more and more artificial intelligence methods, represented by neural network [19] and reinforcement learning [20][21][22], have been introduced into the research of carfollowing scenario. This is because in the training process, the reinforcement learning algorithm can maximise the cumulative reward by interacting with the environment, so as to obtain the optimal strategy according to the changes of the environment [23].…”
Section: Introductionmentioning
confidence: 99%
“…This conventional observer based method will reduce the following performance in the complex traffic environment. At the same time, more and more artificial intelligence methods, represented by neural network [19] and reinforcement learning [20][21][22], have been introduced into the research of carfollowing scenario. This is because in the training process, the reinforcement learning algorithm can maximise the cumulative reward by interacting with the environment, so as to obtain the optimal strategy according to the changes of the environment [23].…”
Section: Introductionmentioning
confidence: 99%