Robotics: Science and Systems XVI 2020
DOI: 10.15607/rss.2020.xvi.001
|View full text |Cite
|
Sign up to set email alerts
|

Planning and Execution using Inaccurate Models with Provable Guarantees

Abstract: We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy under the true dynamics. This objective demonstrates that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 12 publications
(15 citation statements)
references
References 25 publications
0
15
0
Order By: Relevance
“…In contrast, we explicitly target this problem. A related problem was also addressed in (21), which made local adjustments in response to inaccurate predictions encountered in execution. This strategy is complementary to our approach of avoiding areas of predicted model inaccuracy in planning.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast, we explicitly target this problem. A related problem was also addressed in (21), which made local adjustments in response to inaccurate predictions encountered in execution. This strategy is complementary to our approach of avoiding areas of predicted model inaccuracy in planning.…”
Section: Related Workmentioning
confidence: 99%
“…Recent works such as CMAX (Vemula et al 2020) and (McConachie et al 2020) pursue an alternative approach which does not require updating the dynamics of the model or learning a residual component. These approaches exhibit goal-driven behavior by focusing on completing the task and not on modeling the true dynamics accurately.…”
Section: Related Workmentioning
confidence: 99%
“…Following the notation of (Vemula et al 2020), we consider the deterministic shortest path problem that can be represented using the tuple M = (S, A, G, f, c) where S is the state space, A is the action space, G ⊆ S is the non-empty set of goals, f : S × A → S is a deterministic dynamics function, and c : S × A → R + ∪ {0} is the cost function. Crucially, our approach assumes that the action space A is discrete, and any goal state g ∈ G is a cost-free termination state.…”
Section: Problem Setupmentioning
confidence: 99%
“…We can view the abstraction repair problem defined in Section III-B as a model update/repair problem. Though this is a rich area of research [1,8,21,28,30,33,38], we learn a different kind of model than those typically considered. Vemula et al [38] learns model corrections online for planning with inaccurate models, but its model updates bias the planner away from poorly modeled states rather than improving the correspondence between the model and the modeled controller.…”
Section: B Abstraction Repairmentioning
confidence: 99%