Successor Options: An Option Discovery Framework for Reinforcement Learning

Ramesh, Rahul; Tomar, Manan; Ravindran, Balaraman

doi:10.24963/ijcai.2019/458

Cited by 18 publications

(20 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recall that In the context of value function estimation Lapalcian eigenmaps are also known as proto-value function. When computing the proto-value function, an environment is decomposed into basis functions of successor features and options, the linear combinations of which could compute any given task's value function and hence serve as intrinsic motivation [79,84] . It has been shown that eigen-options, which can be estimated through deep successor representation learning, can serve as intrinsic motivation to solve problems such as Montezuma's revenge [85] .…”

Section: Successor Feature Abstraction and Transfermentioning

confidence: 99%

Learning Structures: Predictive Representations, Replay, and Generalization

Momennejad

2020

Current Opinion in Behavioral Sciences

129

112

View full text Add to dashboard Cite

Memory and planning rely on learning the structure of relationships among experiences. Compact representations of these structures guide flexible behavior in humans and animals. A century after 'latent learning' experiments summarized by Tolman, the larger puzzle of cognitive maps remains elusive: how does the brain learn, generalize, and transfer relational structures? This review focuses on a reinforcement learning (RL) approach to learning compact representations of the structure of states. We review evidence showing that capturing structures as predictive representations updated via replay offers a neurally plausible account of human behavior and the neural representations of predictive cognitive maps. We highlight multi-scale successor representations, prioritized replay, and policy-dependence. These advances call for new directions in studying the entanglement of learning and memory with prediction and planning.

show abstract

Section: Successor Feature Abstraction and Transfermentioning

confidence: 99%

Learning Structures: Predictive Representations, Replay, and Generalization

Momennejad

2020

Current Opinion in Behavioral Sciences

129

112

View full text Add to dashboard Cite

show abstract

“…Successor Options [28] used SF to discover landmark states and design a latent reward for learning option policies, but was limited to low-dimensional state spaces. Our framework leverages a SF-based similarity metric to formulate a goal-conditioned policy, abstract the state space as a landmark graph for long-horizon planning and to model state-novelty for driving exploration.…”

Section: Related Workmentioning

confidence: 99%

“…As described above, this enables us to utilize useful landmark metrics such as how many times the agent has been localized to each landmark and what transitions have occurred between landmarks to improve the connectivity quality of the graph. In comparison, landmarks identified through clustering schemes such as in Successor Options [28] cannot be used in this manner because the landmark set is rebuilt every few iterations. See Appendix B.3 for a detailed comparison on landmark formation.…”

Section: Landmark Graphmentioning

confidence: 99%

Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Hoang¹,

Sohn²,

Choi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Operating in the real-world often requires agents to learn about a complex environment and apply this understanding to achieve a breadth of goals. This problem, known as goal-conditioned reinforcement learning (GCRL), becomes especially challenging for long-horizon goals. Current methods have tackled this problem by augmenting goal-conditioned policies with graph-based planning algorithms. However, they struggle to scale to large, high-dimensional state spaces and assume access to exploration mechanisms for efficiently collecting training data. In this work, we introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments so as to obtain a policy that is proficient for any goal. SFL leverages the ability of successor features (SF) to capture transition dynamics, using it to drive exploration by estimating state-novelty and to enable high-level planning by abstracting the state-space as a non-parametric landmarkbased graph. We further exploit SF to directly compute a goal-conditioned policy for inter-landmark traversal, which we use to execute plans to "frontier" landmarks at the edge of the explored state space. We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces and outperforms state-of-the-art baselines on long-horizon GCRL tasks 1 .

show abstract

“…McGovern et al [27] and Menache [28] searched for states that act as bottlenecks to generate skills. Tomar et al [29] used successor representation to generalize the bottleneck approach to continuous state space. These works mainly focus on building options, rather than improving skill learning performance from intrinsic motivation based exploration.…”

Section: Related Workmentioning

confidence: 99%

Intrinsic Motivation Based Hierarchical Exploration for Model and Skill Learning

Zhang

et al. 2020

Electronics

View full text Add to dashboard Cite

Hierarchical skill learning is an important research direction in human intelligence. However, many real-world problems have sparse rewards and a long time horizon, which typically pose challenges in hierarchical skill learning and lead to the poor performance of naive exploration. In this work, we propose an algorithmic framework called surprise-based hierarchical exploration for model and skill learning (Surprise-HEL). The framework leverages the surprise-based intrinsic motivation for improving the efficiency of sampling and driving exploration. It also combines the surprise-based intrinsic motivation and the hierarchical exploration to speed up the model learning and skill learning. Moreover, the framework incorporates the reward independent incremental learning rules and the technique of alternating model learning and policy update to handle the changing intrinsic rewards and the changing models. These works enable the framework to implement the incremental and developmental learning of models and hierarchical skills. We tested Surprise-HEL on a common benchmark domain: Household Robot Pickup and Place. The evaluation results show that the Surprise-HEL framework can significantly improve the agent’s efficiency in model and skill learning in a typical complex domain.

show abstract

Successor Options: An Option Discovery Framework for Reinforcement Learning

Cited by 18 publications

References 3 publications

Learning Structures: Predictive Representations, Replay, and Generalization

Learning Structures: Predictive Representations, Replay, and Generalization

Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Intrinsic Motivation Based Hierarchical Exploration for Model and Skill Learning

Contact Info

Product

Resources

About