Objectives:To study an algorithm to control a bipedal robot to walk so that it has a gait close to that of a human. It is known that the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is a highly efficient algorithm with a few changes compared to the popular algorithm -the commonly used Deep Deterministic Policy Gradient (DDPG) in the continuous action space problem in Reinforcement Learning. Methods: Different from the usual sparse reward function model used, in this study, a reward model combined with a sparse reward function and dense reward function will be proposed. The application of the TD3 algorithm together with the proposed reward function model to control a bipedal robot model with 6 degrees of freedom will be presented. The training process is simulated in Gazebo/Robot Operating System (ROS) environment. Finding: The results show that, when choosing a reward model combined with a sparse reward function and a dense reward function suitable for the robot model, will help it learn faster and achieve better results. The biped robot can walk straight with an almost human-like gait. In the paper, the results from the TD3 algorithm combined with the proposed reward model are also compared with the results from other algorithms. Novelty: Applying the TD3 algorithm combined with the proposed reward model for the 6-DOF biped robot and simulating the robot's gait in Gazebo/ROS environment, ROS is a middleware that can be used to control a robot in a real environment in the future.
In recent years, there has been a lot of research using reinforcement learning algorithms to train 2-legged robots to move, but there are still many challenges. The authors propose the GCTD3 method, which takes the idea of using Graph Convolutional Networks to represent the kinematic link features of the robot, and combines this with the Twin-Delayed Deep Deterministic Policy Gradient algorithm to train the robot to move. Graph Convolutional Networks are very effective in graph-structured problems such as the connection of the joints of the human-like robots. The GCTD3 method shows better results on the motion trajectories of the bipedal robot joints compared with other reinforcement learning algorithms such as Twin-Delayed Deep Deterministic Policy Gradient, Deep Deterministic Policy Gradient and Soft Actor Critic. This research is implemented on a 2-legged robot model with six independent joint coordinates through the Robot Operating System and Gazebo simulator.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.