1999
DOI: 10.1016/s0004-3702(99)00052-1
|View full text |Cite
|
Sign up to set email alerts
|

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

Abstract: Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options-closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

5
1,862
0
7

Year Published

2002
2002
2022
2022

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 2,099 publications
(1,874 citation statements)
references
References 51 publications
(63 reference statements)
5
1,862
0
7
Order By: Relevance
“…Numerous alternative HRL techniques have been proposed (e.g., Ring, 1991Ring, , 1994Jameson, 1991;Tenenberg et al, 1993;Weiss, 1994;Moore and Atkeson, 1995;Precup et al, 1998;Dietterich, 2000b;Menache et al, 2002;Doya et al, 2002;Ghavamzadeh and Mahadevan, 2003;Barto and Mahadevan, 2003;Samejima et al, 2003;Bakker and Schmidhuber, 2004;Whiteson et al, 2005;Simsek and Barto, 2008). While HRL frameworks such as Feudal RL (Dayan and Hinton, 1993) and options (Sutton et al, 1999b;Barto et al, 2004;Singh et al, 2005) do not directly address the problem of automatic subgoal discovery, HQ-Learning (Wiering and Schmidhuber, 1998a) automatically decomposes POMDPs (Sec. 6.3) into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive sub-agents.…”
Section: Deep Hierarchical Rl (Hrl) and Subgoal Learning With Fnns Anmentioning
confidence: 99%
“…Numerous alternative HRL techniques have been proposed (e.g., Ring, 1991Ring, , 1994Jameson, 1991;Tenenberg et al, 1993;Weiss, 1994;Moore and Atkeson, 1995;Precup et al, 1998;Dietterich, 2000b;Menache et al, 2002;Doya et al, 2002;Ghavamzadeh and Mahadevan, 2003;Barto and Mahadevan, 2003;Samejima et al, 2003;Bakker and Schmidhuber, 2004;Whiteson et al, 2005;Simsek and Barto, 2008). While HRL frameworks such as Feudal RL (Dayan and Hinton, 1993) and options (Sutton et al, 1999b;Barto et al, 2004;Singh et al, 2005) do not directly address the problem of automatic subgoal discovery, HQ-Learning (Wiering and Schmidhuber, 1998a) automatically decomposes POMDPs (Sec. 6.3) into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive sub-agents.…”
Section: Deep Hierarchical Rl (Hrl) and Subgoal Learning With Fnns Anmentioning
confidence: 99%
“…In our implementation, experimental results for which are presented in the following section, we use hierarchical reinforcement learning techniques [13] to learn an optimal frame selection strategy over time. Also, we construct new frames through concatenation in a planning-like manner to achieve the original goal of a conversation that went awry.…”
Section: Adjustment/re-framingmentioning
confidence: 99%
“…But many different efforts continue to investigate how to address the potential complexity of reinforcement learning. Several such efforts rely on the appealing idea of reusing the knowledge acquired in one learning process to solve other problems, including the transfer of value functions [5], the reuse of options [6], or the learning of hierarchical modules [7]. The cost of the guided learning is consistently reduced.…”
Section: Introductionmentioning
confidence: 99%