2020 Chinese Control and Decision Conference (CCDC) 2020
DOI: 10.1109/ccdc49329.2020.9164410
|View full text |Cite
|
Sign up to set email alerts
|

An end-to-end learning of driving strategies based on DDPG and imitation learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 10 publications
0
8
0
Order By: Relevance
“…In the hidden layer, each neuron has a rectified linear units (ReLu) activation function that converts its input signal into an output signal. The last output layer of the actor network uses the hyperbolic tangent (tanh) activation function, which maps real numbers to the range [21,1].…”
Section: Algorithm Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…In the hidden layer, each neuron has a rectified linear units (ReLu) activation function that converts its input signal into an output signal. The last output layer of the actor network uses the hyperbolic tangent (tanh) activation function, which maps real numbers to the range [21,1].…”
Section: Algorithm Frameworkmentioning
confidence: 99%
“…19 Many researchers also used deep reinforcement learning methods for vehicle decision control research that imitates driver behavior. 2022 Zhu Meixin 23 used driver data and deep reinforcement learning methods to establish a human-like autonomous car-following model. However, the above research used the gap of the relative speed and relative distance at each moment between the driver and the deep reinforcement learning car-following model to construct the reward function.…”
Section: Introductionmentioning
confidence: 99%
“…The complementarity between IL and RL has been motivating researchers to combine the benefits of both technologies. Current methods fall into two categories: 1) initializing the RL policy network with IL before starting exploration [27], and 2) loading the demonstration transitions into the replay buffer [28], [29] to guide the RL process. Within these methods, the prior knowledge from supervised data provides a foundation for further self-optimization via RL, which can be regarded as the unity of knowledge and action.…”
Section: Combining Il and Rl In Roboticsmentioning
confidence: 99%
“…• DDPGfD: This is a method that combines IL and RL, which modifies the original DDPG by preloading some demonstration transitions into the replay buffer and keeping them forever when training DDPG [28], [29].…”
Section: A Comparative Study In Simulationmentioning
confidence: 99%
“…Among them, the Deep Deterministic Policy Gradient (DDPG) is widely used because of its excellent ability to observe and execute actions instantly in terms of individual intelligence [6], such as for the robotic arms to achieve high precise actions [7], for the Autonomous Underwater Vehicles (AUVs) to patrol intelligently [8]. However, DDPG has problems such as behavioral convergence failure and low training efficiency when dealing with multi-agent environment behavior problems [9,10].…”
Section: Introductionmentioning
confidence: 99%