2023
DOI: 10.3390/electronics12194176
|View full text |Cite
|
Sign up to set email alerts
|

A Reinforcement Learning Method of Solving Markov Decision Processes: An Adaptive Exploration Model Based on Temporal Difference Error

Xianjia Wang,
Zhipeng Yang,
Guici Chen
et al.

Abstract: Traditional backward recursion methods face a fundamental challenge in solving Markov Decision Processes (MDP), where there exists a contradiction between the need for knowledge of optimal expected payoffs and the inability to acquire such knowledge during the decision-making process. To address this challenge and strike a reasonable balance between exploration and exploitation in the decision process, this paper proposes a novel model known as Temporal Error-based Adaptive Exploration (TEAE). Leveraging reinf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 41 publications
0
3
0
Order By: Relevance
“…Transforming the multi-intersection traffic signal control problem within the region into interactions between individual intersections and neighboring intersections enhances the convergence speed of offline learning algorithms [20]. During the experience replay process, it allows for faster learning of the optimal policy from non-current strategy experiences [21] and leads to a more stable All the algorithms designed in this study utilize the same neural network structure. As depicted in Figure 5, it is evident that the performance significantly improved when employing the Mean Field Reinforcement Learning control method.…”
Section: Simulation Experiments Results Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…Transforming the multi-intersection traffic signal control problem within the region into interactions between individual intersections and neighboring intersections enhances the convergence speed of offline learning algorithms [20]. During the experience replay process, it allows for faster learning of the optimal policy from non-current strategy experiences [21] and leads to a more stable All the algorithms designed in this study utilize the same neural network structure. As depicted in Figure 5, it is evident that the performance significantly improved when employing the Mean Field Reinforcement Learning control method.…”
Section: Simulation Experiments Results Analysismentioning
confidence: 99%
“…Transforming the multi-intersection traffic signal control problem within the region into interactions between individual intersections and neighboring intersections enhances the convergence speed of offline learning algorithms [20]. During the experience replay process, it allows for faster learning of the optimal policy from non-current strategy experiences [21] and leads to a more stable convergence of the optimal policy [22]. MFQ-ATSC starts converging after 100-150 iterations, while MFAC-ATSC still has a few intersections that have not reached convergence even after 400 iterations.…”
Section: Simulation Experiments Results Analysismentioning
confidence: 99%
“…In this way, it is possible to look for relationships in historical data from past to future or from future to past and possibly associate them appropriately with the last timestep. The last timestep represents the source of information in the classical Markov decision process (MDP) [ 26 ]. A state vector representing the local memory of the model is fed to the model’s input.…”
Section: Methodsmentioning
confidence: 99%