2022 International Conference on Robotics and Automation (ICRA) 2022
DOI: 10.1109/icra46639.2022.9811652
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Abstract Symmetries in Reinforcement Learning for Complex Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…Meanwhile, the target networks are used to improve the stability of this approximation. Besides, the IBC-DMP agent is equipped with a dual-buffer structure, which is inspired by the previous work on off-policy RL [47], [58]. The demo buffer is used to store the demonstration data of the human motion recorded in Sec.…”
Section: A Overview Of the Training Methods For Ibc-dmp Rlmentioning
confidence: 99%
See 3 more Smart Citations
“…Meanwhile, the target networks are used to improve the stability of this approximation. Besides, the IBC-DMP agent is equipped with a dual-buffer structure, which is inspired by the previous work on off-policy RL [47], [58]. The demo buffer is used to store the demonstration data of the human motion recorded in Sec.…”
Section: A Overview Of the Training Methods For Ibc-dmp Rlmentioning
confidence: 99%
“…This method, however, renders complete separation between BC and agent training, such that BC is not helpful in improving the performance of RL. A recent study proposed a novel method to integrate BC into the training process of an RL agent, which greatly improves the convergence speed [34]. However, the demonstration used for BC is generated by a PID controller in a simulation environment, instead of real human data.…”
Section: *Corresponding Authormentioning
confidence: 99%
See 2 more Smart Citations
“…Similarly to DL, RL, which is aimed at sequential decision-making, can be employed for motion planning in unfamiliar environments. It can resolve high-dimensional problems involving dynamic obstacles by taking into account their location over a limited number of timestamps within the past horizon [108].…”
Section: Reinforcement Learningmentioning
confidence: 99%