2018 IEEE International Conference on Robotics and Automation (ICRA) 2018
DOI: 10.1109/icra.2018.8460854
|View full text |Cite
|
Sign up to set email alerts
|

Active Reward Learning from Critiques

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
50
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(51 citation statements)
references
References 10 publications
1
50
0
Order By: Relevance
“…Another approach for richer feedback could allow users to manually indicate, and potentially correct, the undesirable sections of presented paths. This idea is investigated by Cui and Niekum (2018) where users segment a robots trajectory into good and bad parts.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Another approach for richer feedback could allow users to manually indicate, and potentially correct, the undesirable sections of presented paths. This idea is investigated by Cui and Niekum (2018) where users segment a robots trajectory into good and bad parts.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Even though their method requires fewer actions suggestions than simply receiving demonstrations in an arbitrary order, the agent must be able to freely change the state of the task for asking for guidance in the correct states, which is unfeasible for most domains. Cui and Niekum (2018) use a very similar idea from (Lopes et al, 2009) to move IRL closer to real applications. In their method, the advisee generates a trajectory that is expected to maximize the gain of knowledge, according to an uncertainty function similar to Lopes'.…”
Section: Inverse Reinforcement Learningmentioning
confidence: 99%
“…Such data can be onerous and time-consuming for users to provide. Recent work on active learning for inverse RL has sought to reduce the required number of demonstrations [2,6,26,5], but still requires some number of demonstrations to be provided manually. Our method only requires a modest number of examples of successful outcomes, followed by binary queries where the user indicates whether a particular outcome that the robot achieved is successful or not.…”
Section: Related Workmentioning
confidence: 99%