Automatic discovery and transfer of MAXQ hierarchies

Mehta, Neville; Ray, Soumya; Tadepalli, Prasad; Dietterich, Thomas G.

doi:10.1145/1390156.1390238

Cited by 46 publications

(50 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dietterich (2000) proposed the MAXQ framework which uses several layers of such sub-tasks. However, the structure of these sub-tasks needs to be either specified by the user Dietterich (2000), or they rely on the availability of a successful trajectory (Mehta et al 2008). Barto et al (2004) rely on artificial curiosity to define the reward signal of individual sub-tasks, where the agent aims to maximize its knowledge of the environment to solve new tasks quicker.…”

Section: Related Workmentioning

confidence: 99%

Probabilistic inference for determining options in reinforcement learning

et al. 2016

View full text Add to dashboard Cite

Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Probabilistic inference for determining options in reinforcement learning

et al. 2016

View full text Add to dashboard Cite

show abstract

“…For example, one could investigate how to discover the interaction structure (hierarchy) and the reward function throughout the interaction. A method for hierarchy discovery is described in [Mehta et al 2008]. (2) Investigate when to relearn interaction policies.…”

Section: Discussionmentioning

confidence: 99%

Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots

Cuayáhuitl

Kruijff-Korbayová

Dethlefs

2014

ACM Trans. Interact. Intell. Syst.

View full text Add to dashboard Cite

Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users.

show abstract

“…Mehta et al [13] have a transfer method that works directly within the hierarchical RL framework. They learn a task hierarchy by observing successful behavior in a source task, and then use it to apply the MaxQ hierarchical RL algorithm [4] in the target task.…”

Section: Hierarchical Methodsmentioning

confidence: 99%

Transfer Learning via Advice Taking

Torrey

Shavlik

Walker

et al. 2010

Studies in Computational Intelligence

View full text Add to dashboard Cite

The goal of transfer learning is to speed up learning in a new task by transferring knowledge from one or more related source tasks. We describe a transfer method in which a reinforcement learner analyzes its experience in the source task and learns rules to use as advice in the target task. The rules, which are learned via inductive logic programming, describe the conditions under which an action is successful in the source task. The advice-taking algorithm used in the target task allows a reinforcement learner to benefit from rules even if they are imperfect. A human-provided mapping describes the alignment between the source and target tasks, and may also include advice about the differences between them. Using three tasks in the RoboCup simulated soccer domain, we demonstrate that this transfer method can speed up reinforcement learning substantially.

show abstract

Automatic discovery and transfer of MAXQ hierarchies

Cited by 46 publications

References 8 publications

Probabilistic inference for determining options in reinforcement learning

Probabilistic inference for determining options in reinforcement learning

Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots

Transfer Learning via Advice Taking

Contact Info

Product

Resources

About