Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

Sutton, Richard S.; Precup, Doina; Singh, Satinder

doi:10.1016/s0004-3702(99)00052-1

Cited by 2,099 publications

(1,874 citation statements)

References 51 publications

(63 reference statements)

Supporting

Mentioning

1,862

Contrasting

Unclassified

Order By: Relevance

“…Numerous alternative HRL techniques have been proposed (e.g., Ring, 1991Ring, , 1994Jameson, 1991;Tenenberg et al, 1993;Weiss, 1994;Moore and Atkeson, 1995;Precup et al, 1998;Dietterich, 2000b;Menache et al, 2002;Doya et al, 2002;Ghavamzadeh and Mahadevan, 2003;Barto and Mahadevan, 2003;Samejima et al, 2003;Bakker and Schmidhuber, 2004;Whiteson et al, 2005;Simsek and Barto, 2008). While HRL frameworks such as Feudal RL (Dayan and Hinton, 1993) and options (Sutton et al, 1999b;Barto et al, 2004;Singh et al, 2005) do not directly address the problem of automatic subgoal discovery, HQ-Learning (Wiering and Schmidhuber, 1998a) automatically decomposes POMDPs (Sec. 6.3) into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive sub-agents.…”

Section: Deep Hierarchical Rl (Hrl) and Subgoal Learning With Fnns Anmentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.LATEX source: http://www.idsia.ch/˜juergen/DeepLearning8Oct2014.tex Complete BIBTEX file (888 kB): http://www.idsia.ch/˜juergen/deep.bib Preface This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond, sometimes using "local search" to follow citations of citations backwards in time. Since not all DL publications properly acknowledge earlier relevant work, additional global search strategies were employed, aided by consulting numerous neural network experts. As a result, the present preprint mostly consists of references. Nevertheless, through an expert selection bias I may have missed important work. A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century. For these reasons, this work should be viewed as merely a snapshot of an ongoing credit assignment process. To help improve it, please do not hesitate to send corrections and suggestions to juergen@idsia.ch.

show abstract

Section: Deep Hierarchical Rl (Hrl) and Subgoal Learning With Fnns Anmentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

show abstract

“…In our implementation, experimental results for which are presented in the following section, we use hierarchical reinforcement learning techniques [13] to learn an optimal frame selection strategy over time. Also, we construct new frames through concatenation in a planning-like manner to achieve the original goal of a conversation that went awry.…”

Section: Adjustment/re-framingmentioning

confidence: 99%

Reasoning About Communication – A Practical Approach Based on Empirical Semantics

Fischer

Rovatsos

2004

Cooperative Information Agents VIII

View full text Add to dashboard Cite

Abstract. Given a specification of communication rules in a multiagent system (in the form of protocols, ACL semantics, etc.), the question of how to design appropriate agents that can operate on such a specification is a very important one. In open systems, the problem is complicated even further by the fact that adherence to such a supposedly agreed specification cannot be ensured on the side of other agents. In this paper, we present an architecture for dealing with communication patterns that encompass both a surface structure of admissible message sequences as well as logical constraints for their application. This architecture is based on the InFFrA social reasoning framework and the concept of interaction frames. It assumes an empirical semantics standpoint by which the meaning of communication is pragmatically interpreted through decision-theoretic optimality considerations of a reasoning agent. We introduce the abstract architecture and a formal model and present experimental results from a complex domain to illustrate its usefulness.

show abstract

“…But many different efforts continue to investigate how to address the potential complexity of reinforcement learning. Several such efforts rely on the appealing idea of reusing the knowledge acquired in one learning process to solve other problems, including the transfer of value functions [5], the reuse of options [6], or the learning of hierarchical modules [7]. The cost of the guided learning is consistently reduced.…”

Section: Introductionmentioning

confidence: 99%

Probabilistic Policy Reuse for inter-task transfer learning

Fernández

Garcı́a

Veloso

2010

Robotics and Autonomous Systems

View full text Add to dashboard Cite

Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration by probabilistically including the exploitation of those past policies. Policy Reuse was introduced and previously demonstrated its effectiveness in problems with different reward functions in the same state and action spaces. In this article, we contribute Policy Reuse as transfer learning among different domains. We introduce extended MDPs to include domains and tasks, where domains have different state and action spaces, and task are problems with different rewards within a domain. We show how Policy Reuse can be applied among domains by defining and using a mapping between their state and action spaces. We use several domains, as versions of a simulated RoboCup Keepaway problem, where we show that Policy Reuse can be used as a mechanism of transfer learning significantly outperforming a basic policy learner.

show abstract

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

Cited by 2,099 publications

References 51 publications

Deep learning in neural networks: An overview

Deep learning in neural networks: An overview

Reasoning About Communication – A Practical Approach Based on Empirical Semantics

Probabilistic Policy Reuse for inter-task transfer learning

Contact Info

Product

Resources

About