2022 International Conference on Robotics and Automation (ICRA) 2022
DOI: 10.1109/icra46639.2022.9811660
|View full text |Cite
|
Sign up to set email alerts
|

OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 9 publications
0
4
0
Order By: Relevance
“…MB-ERIL is one instantiation of this approach. Recent approaches corrected the issues around reusing data previously collected during training the discriminator [17], [18].…”
Section: Literature Review a Model-free Imitation Learningmentioning
confidence: 99%
“…MB-ERIL is one instantiation of this approach. Recent approaches corrected the issues around reusing data previously collected during training the discriminator [17], [18].…”
Section: Literature Review a Model-free Imitation Learningmentioning
confidence: 99%
“…MB-ERIL is one instantiation of this approach. Recent approaches corrected the issues around reusing data previously collected while training the discriminator [17], [18].…”
Section: A Model-free Imitation Learningmentioning
confidence: 99%
“…One of the angles in this realm is to recover a robust reward [11,28,29,30]. The AIRL algorithm [11] provides for simultaneous learning of the reward and value function, which is robust to changes in dynamics.…”
Section: Reward Recoverymentioning
confidence: 99%
“…Based on the idea of distribution matching and AIRL, Hoshino et al [29] formulated off-policy inverse reinforcement learning (OPIRL), which not only improves sample efficiency but is able to generalize to unseen environments. Further, receding-horizon inverse reinforcement learning (RHIRL) [23] shows its superiority in scalability and robustness for high-dimensional, noisy, continuous systems with black-box dynamic models.…”
Section: Reward Recoverymentioning
confidence: 99%