Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/458
|View full text |Cite
|
Sign up to set email alerts
|

Successor Options: An Option Discovery Framework for Reinforcement Learning

Abstract: The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. In this work, we instead adopt a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this wo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 3 publications
0
19
0
Order By: Relevance
“…Recall that In the context of value function estimation Lapalcian eigenmaps are also known as proto-value function. When computing the proto-value function, an environment is decomposed into basis functions of successor features and options, the linear combinations of which could compute any given task's value function and hence serve as intrinsic motivation [79,84] . It has been shown that eigen-options, which can be estimated through deep successor representation learning, can serve as intrinsic motivation to solve problems such as Montezuma's revenge [85] .…”
Section: Successor Feature Abstraction and Transfermentioning
confidence: 99%
“…Recall that In the context of value function estimation Lapalcian eigenmaps are also known as proto-value function. When computing the proto-value function, an environment is decomposed into basis functions of successor features and options, the linear combinations of which could compute any given task's value function and hence serve as intrinsic motivation [79,84] . It has been shown that eigen-options, which can be estimated through deep successor representation learning, can serve as intrinsic motivation to solve problems such as Montezuma's revenge [85] .…”
Section: Successor Feature Abstraction and Transfermentioning
confidence: 99%
“…Successor Options [28] used SF to discover landmark states and design a latent reward for learning option policies, but was limited to low-dimensional state spaces. Our framework leverages a SF-based similarity metric to formulate a goal-conditioned policy, abstract the state space as a landmark graph for long-horizon planning and to model state-novelty for driving exploration.…”
Section: Related Workmentioning
confidence: 99%
“…As described above, this enables us to utilize useful landmark metrics such as how many times the agent has been localized to each landmark and what transitions have occurred between landmarks to improve the connectivity quality of the graph. In comparison, landmarks identified through clustering schemes such as in Successor Options [28] cannot be used in this manner because the landmark set is rebuilt every few iterations. See Appendix B.3 for a detailed comparison on landmark formation.…”
Section: Landmark Graphmentioning
confidence: 99%
“…McGovern et al [27] and Menache [28] searched for states that act as bottlenecks to generate skills. Tomar et al [29] used successor representation to generalize the bottleneck approach to continuous state space. These works mainly focus on building options, rather than improving skill learning performance from intrinsic motivation based exploration.…”
Section: Related Workmentioning
confidence: 99%