2011
DOI: 10.1609/aimag.v32i1.2342
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Discovery and Transfer of Task Hierarchies in Reinforcement Learning

Abstract: Sequential decision tasks present many opportunities for the study of transfer learning. A principal one among them is the existence of multiple domains that share the same underlying causal structure for actions. We describe an approach that exploits this shared causal structure to discover a hierarchical task structure in a source domain, which in turn speeds up learning of task execution knowledge in a new target domain. Our approach is theoretically justified and compares favorably to manually designed task… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(19 citation statements)
references
References 22 publications
0
19
0
Order By: Relevance
“…Thus, s and correspond to different levels of a hierarchical situation model, and HRL provides methods to optimize option policies. While most HRL approaches assume a given pre-designed hierarchical structure [32, 36, 42, 57] or only bottom-up learning from the level of primitive states [53, 54, 58], our approach targets at general structural learning of behavioral and situation models by extending “is-a” and “has-parts” ontologies of situation models, including both specialization and generalization [16–18, 40]. …”
Section: Summary and Discussionmentioning
confidence: 99%
“…Thus, s and correspond to different levels of a hierarchical situation model, and HRL provides methods to optimize option policies. While most HRL approaches assume a given pre-designed hierarchical structure [32, 36, 42, 57] or only bottom-up learning from the level of primitive states [53, 54, 58], our approach targets at general structural learning of behavioral and situation models by extending “is-a” and “has-parts” ontologies of situation models, including both specialization and generalization [16–18, 40]. …”
Section: Summary and Discussionmentioning
confidence: 99%
“…Another promising approach that has been drawing much research interest is discovering the hierarchical structure automatically from state-action histories in the environment, either online or offline [Hengst 2002;Stolle 2004;Bakker and Schmidhuber 2004;Ş imşek et al 2005;Mehta et al 2008Mehta et al , 2011. For example, Mehta et al [2008] presents hierarchy induction via models and trajectories (HI-MAT), which discovers MAXQ task hierarchies by applying DBN models to successful execution trajectories of a source MDP task; the HEXQ [Hengst 2002[Hengst , 2004 method decomposes MDPs by finding nested sub-MDPs where there are policies to reach any exit with certainty; and Stolle [2004] performs automatic hierarchical decomposition by taking advantage of the factored representation of the underlying problem.…”
Section: Discussion: Maxq-op Algorithmmentioning
confidence: 99%
“…Some of the current HRL methods which are based on extracting the task-dependent hierarchy in FMDPs include HEX-Q [21], variable influence structure analysis (VISA) [22], and hierarchy induction via models and trajectories (HI-MAT) [23], [24]. Since there are implicit structure representations of the problems among the state variables in FMDPs, DBNs as a high-level source of pre-knowledge are often used to decompose the tasks in such processes, noting their capability to extract the impact of each action on the state variables.…”
Section: B Hrl Methods In Factored Mdps (Fmdps)mentioning
confidence: 99%
“…The state variables that affect others are assigned to deeper levels in the hierarchy. HI-MAT and VISA algorithms rely on the availability of DBNs for each action [22]- [24]. Since VISA considers the impacts of all actions regardless of the domain, it can create unnecessary branches in the extracted hierarchy or unnecessary sub-tasks.…”
Section: B Hrl Methods In Factored Mdps (Fmdps)mentioning
confidence: 99%
See 1 more Smart Citation