High-Speed Autonomous Drifting With Deep Reinforcement Learning

To train a mobile robot to navigate using end-to-end approach which maps sensors data into actions, we can use deep reinforcement learning (DRL) method by providing training environments with proper reward functions. Although some studies have shown the success of DRL in navigation task for mobile robots, the method needs appropriate hyperparameter settings such as the environment's timestep size and the robot's velocity range to produce a good navigation policy. The previous existing DRL framework has proposed the use of odometry sensor to generate dynamic timestep size in the environment to solve the mismatch problem between the timestep size and the robot's velocity. However, the framework lacks a procedure for checking terminal conditions which may occur during action executions resulting inconsistency in the environment and poor navigation policies. In the case of navigation task, terminal conditions may happen when the robot achieves the navigation goal position or collides with obstacles while performing an action in one timestep. To cope with this problem, we propose a state transition checking method in the DRL environment which is specific for navigation task that leverages odometry and laser sensor to ensure that the environment follows Markov Decision Process with dynamic timestep size. We also introduce a velocity increment scheduling to stabilize the mobile robot during training. Our experiment results show that state transition checking along with the velocity increment scheduling are able to make the robot navigate faster with higher success rate compared to other existing DRL frameworks.

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Framework for DRL Navigation With State Transition Checking and Velocity Increment Scheduling

Dewa

Miura

2020

“…Recently, a data-based approach for analyzing the stability of discrete-time nonlinear stochastic systems modeled by Markov decision process, by using the classic Lyapunov's method in control theory [29]. Due to the limited exploration ability caused deterministic policy, high-speed autonomous drifting is addressed, using a closed-loop controller based on the deep RL algorithm soft actor critic (SAC) to control the steering angle and throttle of simulated vehicles in [30]. We should notice a fact that deep reinforcement learning algorithms always require time-consuming training episodes.…”

Section: Introduction a Related Workmentioning

confidence: 99%

A Hybrid Tracking Control Strategy for Nonholonomic Wheeled Mobile Robot Incorporating Deep Reinforcement Learning Approach

Gao

Liang

et al. 2021

Tracking control is an essential capability for nonholonomic wheeled mobile robots (NWMR) to achieve autonomous navigation. This paper presents a novel hybrid control strategy combined modebased control and actor-critic based deep reinforcement learning method. Based on the Lyapunov method, a kinematics control law named given control is obtained with pose errors. Then, the tracking control problem is converted to a finite Markov decision process, in which the defined state contains current tracking errors, given control inputs and one-step errors. After training with deep deterministic policy gradient method, the action named acquired control inputs is capable of compensating the existing errors. Thus, the hybrid control strategy is obtained under velocity constraint, acceleration constraint and bounded uncertainty. A cumulative error is also defined as a criteria to evaluate tracking performance. The comparison results in simulation demonstrate that our proposed method have an obviously advantage on both tracking accuracy and training efficiency.INDEX TERMS Deep reinforcement learning, tracking control, kinematics control, hybrid control strategy, nonholonomic wheeled mobile robot.

“…In our proposed method, we modify the prior work of Frans et al [14] by dividing the training process into two sequential stages for obtaining the optimal policy for each sub task and for acquiring the optimal meta policy. Moreover, we also introduce a module capable of integrating generated actions from those policies by applying the action smoothing strategy [16] which uses weighting mechanism for the current action and for the previous action so that the robot can generate smooth and safe actions. 2) We show the implementation of our proposed method in the case of person-following robot training.…”

Section: Introductionmentioning

confidence: 99%

“…3) We introduce a novel method called weight-scheduled action smoothing for performing the attending task training which does not prevent the exploration for the RL agent and is able to make the robot generate more smooth actions while attending the target person. Since smoothing the robot's actions may prevent the RL agent to find the right or the left attending goals around the target person, we modify the action smoothing strategy [16] and follow the curriculum learning strategy [17] to schedule the smoothing weights for the current action and for the previous action during the attending task training procedure.…”

Section: Introductionmentioning

confidence: 99%

Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning

Dewa

Miura

2021

Given a training environment which follows Markov decision process for a specific task, a deep reinforcement learning (DRL) agent is able to find possible optimal policies which map states of the environment to appropriate actions by repeatedly trying various actions to maximize training rewards. However, the learned policies cannot be reused directly in the training process for other new tasks resulting wasted precious time and resources. To solve this problem, we propose a DRL-based method for training an agent capable of selecting the appropriate policy for current state of the environment from a set of previously trained optimal policies for a given task which can be decomposed into other sub tasks. We implement our proposed method to a person-following robot task training that can be broken down into three sub tasks, namely: navigation, left attending, and right attending. Using the proposed method, the previously learned optimal navigation policy obtained from our previous work is integrated to attending policies which are trained in this study. We also introduce the use of weight-scheduled action smoothing which is able to stabilize actions generated by the agent in the attending task training. Our experiment results show that the proposed method is able to integrate all sub policies using the action smoothing method even though the navigation and the attending policies have dissimilar input structures, unalike output ranges, and are trained in different ways. Moreover, our proposed method shows better results compared to training from scratch and training using transfer learning strategy. INDEX TERMSDeep reinforcement learning, policies integration, person-following robot, robot training