2019 International Conference on Robotics and Automation (ICRA) 2019
DOI: 10.1109/icra.2019.8793554
|View full text |Cite
|
Sign up to set email alerts
|

Learning from Extrapolated Corrections

Abstract: Our goal is to enable robots to learn cost functions from user guidance. Often it is difficult or impossible for users to provide full demonstrations, so corrections have emerged as an easier guidance channel. However, when robots learn cost functions from corrections rather than demonstrations, they have to extrapolate a small amount of informationthe change of a waypoint along the way -to the rest of the trajectory. We cast this extrapolation problem as online function approximation, which exposes different … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 8 publications
0
9
0
Order By: Relevance
“…The concept of learning a hidden reward function from a user is widely used in various human-robot interaction frameworks, such as learning from demonstrations (LfD) [4], [17], learning from corrections [18], [19] and learning from preferences [1], [3], [12], [13], [17].…”
Section: A Related Workmentioning
confidence: 99%
“…The concept of learning a hidden reward function from a user is widely used in various human-robot interaction frameworks, such as learning from demonstrations (LfD) [4], [17], learning from corrections [18], [19] and learning from preferences [1], [3], [12], [13], [17].…”
Section: A Related Workmentioning
confidence: 99%
“…However, expert demonstrations (with or without noise) are often difficult to obtain in real-world tasks. More recently, researchers start focusing on learning with nonexpert feedback on the queries of the robot's behaviors, often in the forms of ratings (Daniel et al 2014), comparisons (Dorsa Sadigh, Sastry, andSeshia 2017), or critiques (Cui and Niekum 2018;Zhang and Dragan 2019). All these prior works rely on an implicit assumption that the non-expert user maintains a correct understanding of the robot's domain dynamics.…”
Section: Related Workmentioning
confidence: 99%
“…The dynamics settings and parameters follow the experiment in Section V-A, and the weight-feature cost function is set as (31). According to [17], for each of the human's corrections, we first utilize the trajectory deformation technique [18] to obtain the corresponding human intended trajectory. Specifically, given a correction a k , the human intended trajectory, denoted as ξθ k = {x θ k 0:T +1 , ūθ k 0:T }, can be solved by…”
Section: Comparison With Related Workmentioning
confidence: 99%
“…To handle the sparse corrections that a human user applies only at sparse time instances during the robot's motion, these methods apply the trajectory deformation technique [20] to interpret each single-time-step correction through a human indented trajectory, i.e., a deformed robot trajectory. Although achieving promising results, choosing the hyper-parameters in the trajectory deformation is challenging, which can affect the learning performance [18]. In addition, these methods have not provided any convergence guarantee of the learning process.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation