“…In particular, gradient-based subgoal discovery with FNNs or RNNs decomposes RL tasks into subtasks for RL submodules (Schmidhuber, 1991b;Schmidhuber and Wahnsiedler, 1992). Numerous alternative HRL techniques have been proposed (e.g., Ring, 1991Ring, , 1994Jameson, 1991;Tenenberg et al, 1993;Weiss, 1994;Moore and Atkeson, 1995;Precup et al, 1998;Dietterich, 2000b;Menache et al, 2002;Doya et al, 2002;Ghavamzadeh and Mahadevan, 2003;Barto and Mahadevan, 2003;Samejima et al, 2003;Bakker and Schmidhuber, 2004;Whiteson et al, 2005;Simsek and Barto, 2008). While HRL frameworks such as Feudal RL (Dayan and Hinton, 1993) and options (Sutton et al, 1999b;Barto et al, 2004;Singh et al, 2005) do not directly address the problem of automatic subgoal discovery, HQ-Learning (Wiering and Schmidhuber, 1998a) automatically decomposes POMDPs (Sec.…”