2021
DOI: 10.48550/arxiv.2103.02650
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Successor Feature Sets: Generalizing Successor Representations Across Policies

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…separately parameterized policy-dependent transition model and an instantaneous reward model (Kulkarni et al 2016;Lehnert and Littman 2020). A wide variety of uses have been proposed for the SF: aiding in exploration (Janz et al 2019;Machado, Bellemare, and Bowling 2020), option discovery (Machado, Bellemare, and Bowling 2017;Machado et al 2018), and transferring across multiple goals (Lehnert, Tellex, and Littman 2017;Zhang et al 2017;Ma et al 2020;Brantley, Mehri, and Gordon 2021), in particular through the generalized policy improvement framework (Barreto et al 2017(Barreto et al , 2018Borsa et al 2018;Hansen et al 2019;Grimm et al 2019). Our method adds to this repertoire, by using the SF inside the learning target in bootstrapping methods.…”
Section: Related Workmentioning
confidence: 99%
“…separately parameterized policy-dependent transition model and an instantaneous reward model (Kulkarni et al 2016;Lehnert and Littman 2020). A wide variety of uses have been proposed for the SF: aiding in exploration (Janz et al 2019;Machado, Bellemare, and Bowling 2020), option discovery (Machado, Bellemare, and Bowling 2017;Machado et al 2018), and transferring across multiple goals (Lehnert, Tellex, and Littman 2017;Zhang et al 2017;Ma et al 2020;Brantley, Mehri, and Gordon 2021), in particular through the generalized policy improvement framework (Barreto et al 2017(Barreto et al , 2018Borsa et al 2018;Hansen et al 2019;Grimm et al 2019). Our method adds to this repertoire, by using the SF inside the learning target in bootstrapping methods.…”
Section: Related Workmentioning
confidence: 99%
“…A further direction is the generalization of the ψ-function over policies [Borsa et al, 2018] analogous to universal value function approximation [Schaul et al, 2015]. Similar approaches use successor maps [Madarasz, 2019], goal-conditioned policies [Ma et al, 2020], or successor feature sets [Brantley et al, 2021]. However, none of these extensions studied the usage of SF in combination with episodic memory.…”
Section: Related Workmentioning
confidence: 99%
“…Another direction is the generalization of the ψ-function over policies analogous to universal value function approximation (Schaul et al, 2015). Similar approaches use successor maps (Madarasz, 2019), goal-conditioned policies (Ma et al, 2020), or successor feature sets (Brantley et al, 2021). Other directions include their application to POMDPs (Vértes and Sahani, 2019), combination with max-entropy principles (Vertes, 2020), or hierarchical RL (Barreto et al, 2021).…”
Section: Related Workmentioning
confidence: 99%