An end-to-end learning of driving strategies based on DDPG and imitation learning

Yan

Proceedings of the Institution of Mechanical Engineers, Part D:

et al. 2021

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.

Section: Algorithm Frameworkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning

Yan

Proceedings of the Institution of Mechanical Engineers, Part D:

et al. 2021

“…The complementarity between IL and RL has been motivating researchers to combine the benefits of both technologies. Current methods fall into two categories: 1) initializing the RL policy network with IL before starting exploration [27], and 2) loading the demonstration transitions into the replay buffer [28], [29] to guide the RL process. Within these methods, the prior knowledge from supervised data provides a foundation for further self-optimization via RL, which can be regarded as the unity of knowledge and action.…”

Section: Combining Il and Rl In Roboticsmentioning

confidence: 99%

“…• DDPGfD: This is a method that combines IL and RL, which modifies the original DDPG by preloading some demonstration transitions into the replay buffer and keeping them forever when training DDPG [28], [29].…”

Section: A Comparative Study In Simulationmentioning

confidence: 99%

Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement Learning

Cai

Wang

Huang

et al. 2021

Preprint

Autonomous car racing is a challenging task in the robotic control area. Traditional modular methods require accurate mapping, localization and planning, which makes them computationally inefficient and sensitive to environmental changes. Recently, deep-learning-based end-to-end systems have shown promising results for autonomous driving/racing. However, they are commonly implemented by supervised imitation learning (IL), which suffers from the distribution mismatch problem, or by reinforcement learning (RL), which requires a huge amount of risky interaction data. In this work, we present a general deep imitative reinforcement learning approach (DIRL), which successfully achieves agile autonomous racing using visual inputs. The driving knowledge is acquired from both IL and model-based RL, where the agent can learn from human teachers as well as perform self-improvement by safely interacting with an offline world model. We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation. The evaluation results demonstrate that our method outperforms previous IL and RL methods in terms of sample efficiency and task performance. Demonstration videos are available at https://caipeide.github.io/autorace-dirl/.

“…Among them, the Deep Deterministic Policy Gradient (DDPG) is widely used because of its excellent ability to observe and execute actions instantly in terms of individual intelligence [6], such as for the robotic arms to achieve high precise actions [7], for the Autonomous Underwater Vehicles (AUVs) to patrol intelligently [8]. However, DDPG has problems such as behavioral convergence failure and low training efficiency when dealing with multi-agent environment behavior problems [9,10].…”

Section: Introductionmentioning

confidence: 99%

An Interactive Self-Learning Game and Evolutionary Approach Based on Non-Cooperative Equilibrium

Zhao

Zhang

et al. 2021

Electronics

Most current studies on multi-agent evolution based on deep learning take a cooperative equilibrium strategy, while interactive self-learning is not always considered. An interactive self-learning game and evolution method based on non-cooperative equilibrium (ISGE-NCE) is proposed to take the benefits of both game theory and interactive learning for multi-agent confrontation evolution. A generative adversarial network (GAN) is designed combining with multi-agent interactive self-learning, and the non-cooperative equilibrium strategy is well adopted within the framework of interactive self-learning, aiming for high evolution efficiency and interest. For assessment, three typical multi-agent confrontation experiments are designed and conducted. The results show that, first, in terms of training speed, the ISGE-NCE produces a training convergence rate of at least 46.3% higher than that of the method without considering interactive self-learning. Second, the evolution rate of the interference and detection agents reaches 60% and 80%, respectively, after training by using our method. In the three different experiment scenarios, compared with the DDPG, our ISGE-NCE method improves the multi-agent evolution effectiveness by 43.4%, 50%, and 20%, respectively, with low training costs. The performances demonstrate the significant superiority of our ISGE-NCE method in swarm intelligence.