2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021
DOI: 10.1109/iros51168.2021.9636681
|View full text |Cite
|
Sign up to set email alerts
|

A Multi-Target Trajectory Planning of a 6-DoF Free-Floating Space Robot via Reinforcement Learning

Abstract: Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Buil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 20 publications
0
11
0
Order By: Relevance
“…For the free-floating space manipulator, Yan et al proposed a trajectory planning method based on Soft Q-learning for a 3-DoF free-floating space robot [17]. To reach multiple targets within a large space, Wang et al developed an improved version of Proximal Policy Optimization (PPO) for a 6-DoF space robot [25]. However, in our experiments, the method can not work well in a 12-DoF dual-arm environment.…”
Section: Related Workmentioning
confidence: 84%
See 4 more Smart Citations
“…For the free-floating space manipulator, Yan et al proposed a trajectory planning method based on Soft Q-learning for a 3-DoF free-floating space robot [17]. To reach multiple targets within a large space, Wang et al developed an improved version of Proximal Policy Optimization (PPO) for a 6-DoF space robot [25]. However, in our experiments, the method can not work well in a 12-DoF dual-arm environment.…”
Section: Related Workmentioning
confidence: 84%
“…Then a velocity tracking PD controller takes a t as input, to generate the torques of joints. Given that position controller performs less smoothly, it is better to choose the velocity controller [25].…”
Section: Formulation Of Optimization Problemmentioning
confidence: 99%
See 3 more Smart Citations