Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor–Critic with Hindsight Experience Replay

The paper addresses the problem of using machine learning in practical robot applications, like dynamic path planning with obstacle avoidance, so as to achieve the performance level of machine learning model scorers in terms of speed and reliability, and the safety and accuracy level of possibly slower, exact algorithmic solutions to the same problems. To this end, the existing simplex architecture for safety assurance in critical systems is extended by an adaptation mechanism, in which one of the redundant controllers (called a high-performance controller) is represented by a trained machine learning model. This model is retrained using field data to reduce its failure rate and redeployed continuously. The proposed adaptive simplex architecture (ASA) is evaluated on the basis of a robot path planning application with dynamic obstacle avoidance in the context of two human-robot collaboration scenarios in manufacturing. The evaluation results indicate that ASA enables a response by the robot in real time when it encounters an obstacle. The solution predicted by the model is economic in terms of path length and smoother than analogous algorithmic solutions. ASA ensures safety by providing an acceptance test, which checks whether the predicted path crosses the obstacle; in which case a suboptimal, yet safe, solution is used.

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive Simplex Architecture for Safe, Real-Time Robot Path Planning

Ionescu

2021

“…Many strategies and algorithms of path planning for a manipulator have been proposed in the literature. These works mainly focused on the two aspects: reactive planning and map-based planning [ 19 , 20 ]. For reactive planning, the robot has a perception system that allows it to know the environment in which it performs its task, and its main application is for environments with dynamic obstacles.…”

Section: Introductionmentioning

confidence: 99%

Exploring a Novel Multiple-Query Resistive Grid-Based Planning Method Applied to High-DOF Robotic Manipulators

Huerta-Chua

Diaz-Arango

Vázquez-Leal

et al. 2021

The applicability of the path planning strategy to robotic manipulators has been an exciting topic for researchers in the last few decades due to the large demand in the industrial sector and its enormous potential development for space, surgical, and pharmaceutical applications. The automation of high-degree-of-freedom (DOF) manipulator robots is a challenging task due to the high redundancy in the end-effector position. Additionally, in the presence of obstacles in the workspace, the task becomes even more complicated. Therefore, for decades, the most common method of integrating a manipulator in an industrial automated process has been the demonstration technique through human operator intervention. Although it is a simple strategy, some drawbacks must be considered: first, the path’s success, length, and execution time depend on operator experience; second, for a structured environment with few objects, the planning task is easy. However, for most typical industrial applications, the environments contain many obstacles, which poses challenges for planning a collision-free trajectory. In this paper, a multiple-query method capable of obtaining collision-free paths for high DOF manipulators with multiple surrounding obstacles is presented. The proposed method is inspired by the resistive grid-based planner method (RGBPM). Furthermore, several improvements are implemented to solve complex planning problems that cannot be handled by the original formulation. The most important features of the proposed planner are as follows: (1) the easy implementation of robotic manipulators with multiple degrees of freedom, (2) the ability to handle dozens of obstacles in the environment, (3) compatibility with various obstacle representations using mathematical models, (4) a new recycling of a previous simulation strategy to convert the RGBPM into a multiple-query planner, and (5) the capacity to handle large sparse matrices representing the configuration space. A numerical simulation was carried out to validate the proposed planning method’s effectiveness for manipulators with three, five, and six DOFs on environments with dozens of surrounding obstacles. The case study results show the applicability of the proposed novel strategy in quickly computing new collision-free paths using the first execution data. Each new query requires less than 0.2 s for a 3 DOF manipulator in a configuration space free-modeled by a 7291 × 7291 sparse matrix and less than 30 s for five and six DOF manipulators in a configuration space free-modeled by 313,958 × 313,958 and 204,087 × 204,087 sparse matrices, respectively. Finally, a simulation was conducted to validate the proposed multiple-query RGBPM planner’s efficacy in finding feasible paths without collision using a six-DOF manipulator (KUKA LBR iiwa 14R820) in a complex environment with dozens of surrounding obstacles.

“…In recent years, SAC has been widely used in autonomous decision-making, intelligent planning, and motion control of mobile robots, UAVs, and manipulators. Prianto et al [ 22 ] presented a deep reinforcement learning-based path planning algorithm for the multi-arm manipulator. To solve the problem of high-dimensional path planning, SAC is used to enhance the exploration performance of the robotic arm.…”

Section: Introductionmentioning

confidence: 99%

End-to-End AUV Motion Planning Method Based on Soft Actor-Critic

Sun

Wang

et al. 2021

This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor–critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.