Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving

Li, Junxiang; Liang, Yao; Xu, Xin; Cheng, Bang; Ren, Junkai

doi:10.1016/j.ins.2020.03.105

Cited by 61 publications

(25 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PSO is a random search algorithm based on group collaboration developed by simulating the foraging behavior of bird swarms (Santosh and Ashok, 2020; Zhang and Huang, 2020; Zhang and Liu, 2019). Because of the convergence disadvantages, the research on PSO is mainly focused on improving and optimizing the population structure and corresponding parameters (Li et al , 2020; Esmat et al , 2020). To solve the path planning problem in an unknown environment and improve the convergence speed of the PSO algorithm, the document by Di et al (2020) proposed an improved PSO method based on bionic neural network, using a bionic neural network to train the PSO algorithm.…”

Section: Related Workmentioning

confidence: 99%

A new path plan method based on hybrid algorithm of reinforcement learning and particle swarm optimization

Liu

Zhang

et al. 2021

View full text Add to dashboard Cite

PurposeTo solve the path planning problem of the intelligent driving vehicular, this paper designs a hybrid path planning algorithm based on optimized reinforcement learning (RL) and improved particle swarm optimization (PSO).Design/methodology/approachFirst, the authors optimized the hyper-parameters of RL to make it converge quickly and learn more efficiently. Then the authors designed a pre-set operation for PSO to reduce the calculation of invalid particles. Finally, the authors proposed a correction variable that can be obtained from the cumulative reward of RL; this revises the fitness of the individual optimal particle and global optimal position of PSO to achieve an efficient path planning result. The authors also designed a selection parameter system to help to select the optimal path.FindingsSimulation analysis and experimental test results proved that the proposed algorithm has advantages in terms of practicability and efficiency. This research also foreshadows the research prospects of RL in path planning, which is also the authors’ next research direction.Originality/valueThe authors designed a pre-set operation to reduce the participation of invalid particles in the calculation in PSO. And then, the authors designed a method to optimize hyper-parameters to improve learning efficiency of RL. And then they used RL trained PSO to plan path. The authors also proposed an optimal path evaluation system. This research also foreshadows the research prospects of RL in path planning, which is also the authors’ next research direction.

show abstract

Section: Related Workmentioning

confidence: 99%

A new path plan method based on hybrid algorithm of reinforcement learning and particle swarm optimization

Liu

Zhang

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Then, every N steps copy the current value network parameters to the target value network parameters, thereby stabilizing the training process and making the model easier to converge. Considering that the traditional DQN algorithm has the shortcomings of over-estimation of Q value, weak directivity, and poor stability, some improved DQN methods have been proposed [42][43][44].…”

Section: Deep Q-networkmentioning

confidence: 99%

Improved Deep Q-Network for User-Side Battery Energy Storage Charging and Discharging Strategy in Industrial Parks

Chen

Jiang

et al. 2021

Entropy

View full text Add to dashboard Cite

Battery energy storage technology is an important part of the industrial parks to ensure the stable power supply, and its rough charging and discharging mode is difficult to meet the application requirements of energy saving, emission reduction, cost reduction, and efficiency increase. As a classic method of deep reinforcement learning, the deep Q-network is widely used to solve the problem of user-side battery energy storage charging and discharging. In some scenarios, its performance has reached the level of human expert. However, the updating of storage priority in experience memory often lags behind updating of Q-network parameters. In response to the need for lean management of battery charging and discharging, this paper proposes an improved deep Q-network to update the priority of sequence samples and the training performance of deep neural network, which reduces the cost of charging and discharging action and energy consumption in the park. The proposed method considers factors such as real-time electricity price, battery status, and time. The energy consumption state, charging and discharging behavior, reward function, and neural network structure are designed to meet the flexible scheduling of charging and discharging strategies, and can finally realize the optimization of battery energy storage benefits. The proposed method can solve the problem of priority update lag, and improve the utilization efficiency and learning performance of the experience pool samples. The paper selects electricity price data from the United States and some regions of China for simulation experiments. Experimental results show that compared with the traditional algorithm, the proposed approach can achieve better performance in both electricity price systems, thereby greatly reducing the cost of battery energy storage and providing a stronger guarantee for the safe and stable operation of battery energy storage systems in industrial parks.

show abstract

“…Concerning the critic network, the objective is to minimize the difference between the Q value calculated in the current state and the target Q value, which can be updated by the loss function as follows [45]:

y_{t} = r (s_{t}, a_{t}) + γ Q (s_{t + 1}, u (s_{t + 1}) |θ^{ω})

L (θ^{ω}) = {double-struckE}_{ω^{'}} [{(Q (s_{t}, a_{t} |θ^{ω}) - y_{t})}^{2}]

ω_{t + 1} = ω_{t} + n_{c} \nabla_{^{μ}} L (θ^{ω})

where

n_{c}

is the learning rate of the critic network. In fact, the meaning of critic network training is to minimize the difference between

y_{t}

and

Q (s_{t}, a_{t} | θ^{w})

.…”

Section: Algorithmmentioning

confidence: 99%

“…Regarding the actor network, the function J [45] is utilized to output a deterministic value through a deterministic strategy gradient, as expressed in the following equations:

\begin{matrix} true \\ right \nabla_{θ^{μ}} J & = & left E_{s_{t} \approx ρ^{u}} ([], \nabla_{θ^{μ}}, Q, (s, a (|, θ^{ω})), |_{s = s_{i}, a = μ ((), s_{i})}) \\ = & left E_{s_{t} \approx ρ^{u}} ([] \nabla_{a} Q (s, a (|, θ^{ω})) |_{s = s_{i}, a = μ ((), s_{i})} \cdot \nabla_{θ^{μ}} μ (s (|, θ^{ω})) |_{s = s_{i}}) \end{matrix}

μ_{t + 1} = μ_{t} + n_{a} \nabla_{θ^{w}} J

where

n_{a}

is the learning rate of the actor network. The purpose of actor network training is to maximize

Q (s, a)

.…”

Section: Algorithmmentioning

confidence: 99%

A novel deep reinforcement learning enabled agent for pumped storage hydro‐wind‐solar systems voltage control

Huang

Zhang

et al. 2021

IET Renewable Power Gen

View full text Add to dashboard Cite

With the large‐scale penetration of wind and solar energies in the power system, the randomness of this renewable energy increases the non‐linear characteristics and uncertainty of the system, which causes a mismatch between renewable energy generation and load demand and it will badly affect the bus voltage control of distribution network. In this context, this study applies pumped storage hydroelectric (PSH) which tracks the load variation rapidly, operate flexibly and reliably to balance the power of the system to minimize the bus voltage deviation. Moreover, to obtain the optimal control policy of PSH, a deep‐reinforcement‐learning algorithm, that is, deep deterministic policy gradient, is utilized to train the agent to address the continuous transformation of the pumped storage hydro‐wind‐solar (PSHWS) system. The performance of a well‐trained agent was evaluated on the IEEE 30‐bus power system. Simulation results show that the proposed method achieves an improvement of 21.8% in cumulative deviation per month, which implies that it can keep the system operating in a safe voltage range more effectively.

show abstract

Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving

Cited by 61 publications

References 16 publications

A new path plan method based on hybrid algorithm of reinforcement learning and particle swarm optimization

A new path plan method based on hybrid algorithm of reinforcement learning and particle swarm optimization

Improved Deep Q-Network for User-Side Battery Energy Storage Charging and Discharging Strategy in Industrial Parks

A novel deep reinforcement learning enabled agent for pumped storage hydro‐wind‐solar systems voltage control

Contact Info

Product

Resources

About