“…Numerous alternative HRL techniques have been proposed (e.g., Ring, 1991Ring, , 1994Jameson, 1991;Tenenberg et al, 1993;Weiss, 1994;Moore and Atkeson, 1995;Precup et al, 1998;Dietterich, 2000b;Menache et al, 2002;Doya et al, 2002;Ghavamzadeh and Mahadevan, 2003;Barto and Mahadevan, 2003;Samejima et al, 2003;Bakker and Schmidhuber, 2004;Whiteson et al, 2005;Simsek and Barto, 2008). While HRL frameworks such as Feudal RL (Dayan and Hinton, 1993) and options (Sutton et al, 1999b;Barto et al, 2004;Singh et al, 2005) do not directly address the problem of automatic subgoal discovery, HQ-Learning (Wiering and Schmidhuber, 1998a) automatically decomposes POMDPs (Sec. 6.3) into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive sub-agents.…”