A Multi-Target Trajectory Planning of a 6-DoF Free-Floating Space Robot via Reinforcement Learning

Wang, Shengjie; Zheng, Xiang; Cao, Yuxue; Zhang, Tao

doi:10.1109/iros51168.2021.9636681

Cited by 15 publications

(11 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the free-floating space manipulator, Yan et al proposed a trajectory planning method based on Soft Q-learning for a 3-DoF free-floating space robot [17]. To reach multiple targets within a large space, Wang et al developed an improved version of Proximal Policy Optimization (PPO) for a 6-DoF space robot [25]. However, in our experiments, the method can not work well in a 12-DoF dual-arm environment.…”

Section: Related Workmentioning

confidence: 84%

“…Then a velocity tracking PD controller takes a t as input, to generate the torques of joints. Given that position controller performs less smoothly, it is better to choose the velocity controller [25].…”

Section: Formulation Of Optimization Problemmentioning

confidence: 99%

“…• Wang's method [25]: Wang's method solved the task of multiple targets for a single arm based on an improved version of PPO algorithm.…”

Section: Comparison With Other Baselinesmentioning

confidence: 99%

“…• SAC-D: Considering Soft Actor-Critic (SAC) algorithm achieves superior performance in many robotic control task [37], we designed SAC-D algorithm for our task, in which we applied the similar reward function in [25].…”

Section: Comparison With Other Baselinesmentioning

confidence: 99%

“…The number of hidden units per layer is 256, and the activation of output layer in policy network is Tanh function. Additionally, the input and output are also the same variables, and other parameters in each algorithm are provided by some papers [18,25] or OpenAI baselines [38].…”

Section: Comparison With Other Baselinesmentioning

confidence: 99%

See 4 more Smart Citations

A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object

Wang¹,

Cao²,

Zheng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Recent years have seen the emergence of non-cooperative objects in space, like failed satellites and space junk. These objects are usually operated or collected by free-float dual-arm space manipulators. Thanks to eliminating the difficulties of modeling and manual parameter-tuning, reinforcement learning (RL) methods have shown a more promising sign in the trajectory planning of space manipulators. Although previous studies demonstrate their effectiveness, they cannot be applied in tracking dynamic targets with unknown rotation (non-cooperative objects). In this paper, we proposed a learning system for motion planning of free-float dual-arm space manipulator (FFDASM) towards non-cooperative objects. Specifically, our method consists of two modules. Module I realizes the multi-target trajectory planning for two end-effectors within a large target space. Next, Module II takes as input the point clouds of the non-cooperative object to estimate the motional property, and then can predict the position of target points on an non-cooperative object. We leveraged the combination of Module I and Module II to track target points on a spinning object with unknown regularity successfully. Furthermore, the experiments also demonstrate the scalability and generalization of our learning system.

show abstract

Section: Related Workmentioning

confidence: 84%

Section: Formulation Of Optimization Problemmentioning

confidence: 99%

“…• Wang's method [25]: Wang's method solved the task of multiple targets for a single arm based on an improved version of PPO algorithm.…”

Section: Comparison With Other Baselinesmentioning

confidence: 99%

Section: Comparison With Other Baselinesmentioning

confidence: 99%

Section: Comparison With Other Baselinesmentioning

confidence: 99%