2015
DOI: 10.1007/978-3-319-14803-8_18
|View full text |Cite
|
Sign up to set email alerts
|

Learning Options for an MDP from Demonstrations

Abstract: The options framework provides a foundation to use hierarchical actions in reinforcement learning. An agent using options, along with primitive actions, at any point in time can decide to perform a macro-action made out of many primitive actions rather than a primitive action. Such macro-actions can be hand-crafted or learned. There has been previous work on learning them by exploring the environment.Here we take a different perspective and present an approach to learn options from a set of experts demonstrati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…Key areas for future work include (1) relaxing the assumption that controllers are provided; (2) learning better abstractions from even fewer demonstrations by performing active learning to gather more data online; (3) expanding the expressivity of the grammar to learn more sophisticated predicates; and (4) applying the ideas presented in this paper to non-deterministic and/or partially observed planning problems. For (1), the literature on option learning [89,90] provides a starting point; in our setting, it would be necessary to not only learn the initiation sets, policies, and termination conditions of the options [51], but also segment the demonstrations appropriately once options have been learned, since now these demonstrations would not contain actions. For (2), we hope to investigate how relational exploration algorithms [53,91] might be useful as a mechanism for an agent to decide what actions to execute, toward the goal of building better state and action abstractions.…”
Section: Discussionmentioning
confidence: 99%
“…Key areas for future work include (1) relaxing the assumption that controllers are provided; (2) learning better abstractions from even fewer demonstrations by performing active learning to gather more data online; (3) expanding the expressivity of the grammar to learn more sophisticated predicates; and (4) applying the ideas presented in this paper to non-deterministic and/or partially observed planning problems. For (1), the literature on option learning [89,90] provides a starting point; in our setting, it would be necessary to not only learn the initiation sets, policies, and termination conditions of the options [51], but also segment the demonstrations appropriately once options have been learned, since now these demonstrations would not contain actions. For (2), we hope to investigate how relational exploration algorithms [53,91] might be useful as a mechanism for an agent to decide what actions to execute, toward the goal of building better state and action abstractions.…”
Section: Discussionmentioning
confidence: 99%
“…Tamassia et al . (2016) suggest a different approach: dynamically selecting state-space abstraction by which different states that share the same abstraction features are considered similar. Sequeira et al .…”
Section: Preliminaries and Backgroundmentioning
confidence: 99%
“…While the above methods consider the expert data at a global scale, our work is concerned with the problem of subgoal modeling, which is often conducted in the form of option-based reasoning (Sutton et al, 1999). For instance, Tamassia et al (2015) proposed a clustering approach based on state distances to find a minimal set of options that can explain the expert behavior. While the method provides a simple alternative to handcrafting options, it does not allow any probabilistic treatment of the data and involves many ad-hoc design choices.…”
Section: Related Workmentioning
confidence: 99%
“…. , |S|} indicates the subgoal location and C ∈ (0, ∞) is some positive constant (compare S ¸imşek et al, 2005;Stolle and Precup, 2002;Tamassia et al, 2015).…”
Section: Revisiting the Bnirl Frameworkmentioning
confidence: 99%