Proceedings of the 25th International Conference on Machine Learning - ICML '08 2008
DOI: 10.1145/1390156.1390238
|View full text |Cite
|
Sign up to set email alerts
|

Automatic discovery and transfer of MAXQ hierarchies

Abstract: We present an algorithm, HI-MAT (Hierarchy Induction via Models And Trajectories), that discovers MAXQ task hierarchies by applying dynamic Bayesian network models to a successful trajectory from a source reinforcement learning task. HI-MAT discovers subtasks by analyzing the causal and temporal relationships among the actions in the trajectory. Under appropriate assumptions, HI-MAT induces hierarchies that are consistent with the observed trajectory and have compact value-function tables employing safe state … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
50
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 46 publications
(50 citation statements)
references
References 8 publications
0
50
0
Order By: Relevance
“…Dietterich (2000) proposed the MAXQ framework which uses several layers of such sub-tasks. However, the structure of these sub-tasks needs to be either specified by the user Dietterich (2000), or they rely on the availability of a successful trajectory (Mehta et al 2008). Barto et al (2004) rely on artificial curiosity to define the reward signal of individual sub-tasks, where the agent aims to maximize its knowledge of the environment to solve new tasks quicker.…”
Section: Related Workmentioning
confidence: 99%
“…Dietterich (2000) proposed the MAXQ framework which uses several layers of such sub-tasks. However, the structure of these sub-tasks needs to be either specified by the user Dietterich (2000), or they rely on the availability of a successful trajectory (Mehta et al 2008). Barto et al (2004) rely on artificial curiosity to define the reward signal of individual sub-tasks, where the agent aims to maximize its knowledge of the environment to solve new tasks quicker.…”
Section: Related Workmentioning
confidence: 99%
“…For example, one could investigate how to discover the interaction structure (hierarchy) and the reward function throughout the interaction. A method for hierarchy discovery is described in [Mehta et al 2008]. (2) Investigate when to relearn interaction policies.…”
Section: Discussionmentioning
confidence: 99%
“…Mehta et al [13] have a transfer method that works directly within the hierarchical RL framework. They learn a task hierarchy by observing successful behavior in a source task, and then use it to apply the MaxQ hierarchical RL algorithm [4] in the target task.…”
Section: Hierarchical Methodsmentioning
confidence: 99%