A Reinforcement Learning Method of Solving Markov Decision Processes: An Adaptive Exploration Model Based on Temporal Difference Error

Wang, Xianjia; Yang, Zhipeng; Chen, Guici; Liu, Yanli

doi:10.3390/electronics12194176

Cited by 2 publications

(3 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Transforming the multi-intersection traffic signal control problem within the region into interactions between individual intersections and neighboring intersections enhances the convergence speed of offline learning algorithms [20]. During the experience replay process, it allows for faster learning of the optimal policy from non-current strategy experiences [21] and leads to a more stable All the algorithms designed in this study utilize the same neural network structure. As depicted in Figure 5, it is evident that the performance significantly improved when employing the Mean Field Reinforcement Learning control method.…”

Section: Simulation Experiments Results Analysismentioning

confidence: 99%

“…Transforming the multi-intersection traffic signal control problem within the region into interactions between individual intersections and neighboring intersections enhances the convergence speed of offline learning algorithms [20]. During the experience replay process, it allows for faster learning of the optimal policy from non-current strategy experiences [21] and leads to a more stable convergence of the optimal policy [22]. MFQ-ATSC starts converging after 100-150 iterations, while MFAC-ATSC still has a few intersections that have not reached convergence even after 400 iterations.…”

Section: Simulation Experiments Results Analysismentioning

confidence: 99%

See 1 more Smart Citation

Mean Field Multi-Agent Reinforcement Learning Method for Area Traffic Signal Control

Zhang,

Liu

et al. 2023

Electronics

View full text Add to dashboard Cite

Reinforcement learning is an effective method for adaptive traffic signal control in urban transportation networks. As the number of training rounds increases, the optimal control strategy is learned, and the learning capabilities of deep neural networks are further enhanced, thereby avoiding the limitations of traditional signal control methods. However, when faced with the sequential decision tasks of regional signal control, it encounters issues such as the curse of dimensionality and environmental non-stationarity. To address the limitations of traditional reinforcement learning algorithms applied to multiple intersections, the mean field theory is applied. This models the traffic signal control problem at multiple intersections within a region as interactions between individual intersections and the average effects of neighboring intersections. By decomposing the Q-function through bilateral estimation between the agent and its neighbors, this method reduces the complexity of interactions between agents while preserving global interactions between the agents. A traffic signal control model based on Mean Field Multi-Agent Reinforcement Learning (MFMARL) was constructed, containing two algorithms: Mean Field Q-Network Area Traffic Signal Control (MFQ-ATSC) and Mean Field Actor-Critic Network Area Traffic Signal Control (MFAC-ATSC). The model was validated using the SUMO simulation platform. The experimental results indicate that across different metrics, such as average speed, the mean field reinforcement learning method outperforms classical signal control methods and several existing approaches.

show abstract

Section: Simulation Experiments Results Analysismentioning

confidence: 99%

“…Transforming the multi-intersection traffic signal control problem within the region into interactions between individual intersections and neighboring intersections enhances the convergence speed of offline learning algorithms [20]. During the experience replay process, it allows for faster learning of the optimal policy from non-current strategy experiences [21] and leads to a more stable convergence of the optimal policy [22]. MFQ-ATSC starts converging after 100-150 iterations, while MFAC-ATSC still has a few intersections that have not reached convergence even after 400 iterations.…”

Section: Simulation Experiments Results Analysismentioning

confidence: 99%

Mean Field Multi-Agent Reinforcement Learning Method for Area Traffic Signal Control

Zhang,

Liu

et al. 2023

Electronics

View full text Add to dashboard Cite

show abstract

“…In this way, it is possible to look for relationships in historical data from past to future or from future to past and possibly associate them appropriately with the last timestep. The last timestep represents the source of information in the classical Markov decision process (MDP) [ 26 ]. A state vector representing the local memory of the model is fed to the model’s input.…”

Section: Methodsmentioning

confidence: 99%

Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor

Dirgová Luptáková,

Kubovčík,

Pospíchal

2024

Sensors

View full text Add to dashboard Cite

A transformer neural network is employed in the present study to predict Q-values in a simulated environment using reinforcement learning techniques. The goal is to teach an agent to navigate and excel in the Flappy Bird game, which became a popular model for control in machine learning approaches. Unlike most top existing approaches that use the game’s rendered image as input, our main contribution lies in using sensory input from LIDAR, which is represented by the ray casting method. Specifically, we focus on understanding the temporal context of measurements from a ray casting perspective and optimizing potentially risky behavior by considering the degree of the approach to objects identified as obstacles. The agent learned to use the measurements from ray casting to avoid collisions with obstacles. Our model substantially outperforms related approaches. Going forward, we aim to apply this approach in real-world scenarios.

show abstract

A Reinforcement Learning Method of Solving Markov Decision Processes: An Adaptive Exploration Model Based on Temporal Difference Error

Cited by 2 publications

References 41 publications

Mean Field Multi-Agent Reinforcement Learning Method for Area Traffic Signal Control

Mean Field Multi-Agent Reinforcement Learning Method for Area Traffic Signal Control

Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor

Contact Info

Product

Resources

About