2020
DOI: 10.1016/j.ins.2020.03.105
|View full text |Cite
|
Sign up to set email alerts
|

Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 61 publications
(25 citation statements)
references
References 16 publications
0
22
0
Order By: Relevance
“…PSO is a random search algorithm based on group collaboration developed by simulating the foraging behavior of bird swarms (Santosh and Ashok, 2020; Zhang and Huang, 2020; Zhang and Liu, 2019). Because of the convergence disadvantages, the research on PSO is mainly focused on improving and optimizing the population structure and corresponding parameters (Li et al , 2020; Esmat et al , 2020). To solve the path planning problem in an unknown environment and improve the convergence speed of the PSO algorithm, the document by Di et al (2020) proposed an improved PSO method based on bionic neural network, using a bionic neural network to train the PSO algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…PSO is a random search algorithm based on group collaboration developed by simulating the foraging behavior of bird swarms (Santosh and Ashok, 2020; Zhang and Huang, 2020; Zhang and Liu, 2019). Because of the convergence disadvantages, the research on PSO is mainly focused on improving and optimizing the population structure and corresponding parameters (Li et al , 2020; Esmat et al , 2020). To solve the path planning problem in an unknown environment and improve the convergence speed of the PSO algorithm, the document by Di et al (2020) proposed an improved PSO method based on bionic neural network, using a bionic neural network to train the PSO algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…Then, every N steps copy the current value network parameters to the target value network parameters, thereby stabilizing the training process and making the model easier to converge. Considering that the traditional DQN algorithm has the shortcomings of over-estimation of Q value, weak directivity, and poor stability, some improved DQN methods have been proposed [42][43][44].…”
Section: Deep Q-networkmentioning
confidence: 99%
“…Concerning the critic network, the objective is to minimize the difference between the Q value calculated in the current state and the target Q value, which can be updated by the loss function as follows [45]: yt=rst,at+γQst+1,ust+1θω Lθω=double-struckEωQst,atθωyt2 ωt+1=ωt+ncμLθωwhere nc is the learning rate of the critic network. In fact, the meaning of critic network training is to minimize the difference between yt and Q(st,at|θw).…”
Section: Algorithmmentioning
confidence: 99%
“…Regarding the actor network, the function J [45] is utilized to output a deterministic value through a deterministic strategy gradient, as expressed in the following equations: truerightθμJ=leftEstρu[]θμQs,a|θωs=si,a=μ()si=leftEstρu[]aQs,a|θωs=si,a=μ()si·θμμs|θωs=si μt+1=μt+naθwJwhere na is the learning rate of the actor network. The purpose of actor network training is to maximize Q(s,a).…”
Section: Algorithmmentioning
confidence: 99%