Reinforcement learning (RL) has shown outstanding performance in handling complex tasks in recent years. Eligibility trace (ET), a fundamental and important mechanism in reinforcement learning, records critical states with attenuation and guides the update of policy, which plays a crucial role in accelerating the convergence of RL training. However, ET implementation on conventional digital computing hardware is energy hungry and restricted by the memory wall due to massive calculation of exponential decay functions. Here, in‐memory realization of ET for energy‐efficient reinforcement learning with outstanding performance in discrete‐ and continuous‐state RL tasks is demonstrated. For the first time, the inherent conductance drift of phase change memory is exploited as physical decay function to realize in‐memory eligibility trace, demonstrating excellent performance during RL training in various tasks. The spontaneous in‐memory decay computing and storage of policy in the same phase change memory give rise to significantly enhanced energy efficiency compared with traditional graphics processing unit platforms. This work therefore provides a holistic energy and hardware efficient method for both training and inference of reinforcement learning.
Fast road emergency response can minimize the losses caused by traffic accidents. However, emergency rescue on urban arterial roads is faced with the high probability of congestion caused by accidents, which makes the planning of rescue path complicated. This paper proposes a refined path planning method for emergency rescue vehicles on congested urban arterial roads during traffic accidents. Firstly, a rescue path planning environment for emergency vehicles on congested urban arterial roads based on the Markov decision process is established, which focuses on the architecture of arterial roads, taking the traffic efficiency and vehicle queue length into consideration of path planning; then, the prioritized experience replay deep Q-network (PERDQN) reinforcement learning algorithm is used for path planning under different traffic control schemes. The proposed method is tested on the section of East Youyi Road in Xi’an, Shaanxi Province, China. The results show that compared with the traditional shortest path method, the rescue route planned by PERDQN reduces the arrival time to the accident site by 67.1%, and the queue length at upstream of the accident point is shortened by 16.3%, which shows that the proposed method is capable to plan the rescue path for emergency vehicles in urban arterial roads with congestion, shorten the arrival time, and reduce the vehicle queue length caused by accidents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.