Successor Feature Sets: Generalizing Successor Representations Across Policies

Brantley, Kianté; Mehri, Soroush; Gordon, Geoffrey J.

doi:10.48550/arxiv.2103.02650

Cited by 3 publications

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…separately parameterized policy-dependent transition model and an instantaneous reward model (Kulkarni et al 2016;Lehnert and Littman 2020). A wide variety of uses have been proposed for the SF: aiding in exploration (Janz et al 2019;Machado, Bellemare, and Bowling 2020), option discovery (Machado, Bellemare, and Bowling 2017;Machado et al 2018), and transferring across multiple goals (Lehnert, Tellex, and Littman 2017;Zhang et al 2017;Ma et al 2020;Brantley, Mehri, and Gordon 2021), in particular through the generalized policy improvement framework (Barreto et al 2017(Barreto et al , 2018Borsa et al 2018;Hansen et al 2019;Grimm et al 2019). Our method adds to this repertoire, by using the SF inside the learning target in bootstrapping methods.…”

Section: Related Workmentioning

confidence: 99%

A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions

GX-Chen

Chelu

Richards

et al. 2022

AAAI

View full text Add to dashboard Cite

Estimating value functions is a core component of reinforcement learning algorithms. Temporal difference (TD) learning algorithms use bootstrapping, i.e. they update the value function toward a learning target using value estimates at subsequent time-steps. Alternatively, the value function can be updated toward a learning target constructed by separately predicting successor features (SF)—a policy-dependent model—and linearly combining them with instantaneous rewards. We focus on bootstrapping targets used when estimating value functions, and propose a new backup target, the ?-return mixture, which implicitly combines value-predictive knowledge (used by TD methods) with (successor) feature-predictive knowledge—with a parameter ? capturing how much to rely on each. We illustrate that incorporating predictive knowledge through an ??-discounted SF model makes more efficient use of sampled experience, compared to either extreme, i.e. bootstrapping entirely on the value function estimate, or bootstrapping on the product of separately estimated successor features and instantaneous reward models. We empirically show this approach leads to faster policy evaluation and better control performance, for tabular and nonlinear function approximations, indicating scalability and generality.

show abstract

Section: Related Workmentioning

confidence: 99%

A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions

GX-Chen

Chelu

Richards

et al. 2022

AAAI

View full text Add to dashboard Cite

show abstract

“…A further direction is the generalization of the ψ-function over policies [Borsa et al, 2018] analogous to universal value function approximation [Schaul et al, 2015]. Similar approaches use successor maps [Madarasz, 2019], goal-conditioned policies [Ma et al, 2020], or successor feature sets [Brantley et al, 2021]. However, none of these extensions studied the usage of SF in combination with episodic memory.…”

Section: Related Workmentioning

confidence: 99%

Successor Feature Neural Episodic Control

Emukpere¹,

Alameda-Pineda²,

Reinke³

2021

Preprint

View full text Add to dashboard Cite

A longstanding goal in reinforcement learning is to build intelligent agents that show fast learning and a flexible transfer of skills akin to humans and animals. This paper investigates the integration of two frameworks for tackling those goals: episodic control and successor features. Episodic control is a cognitively inspired approach relying on episodic memory, an instance-based memory model of an agent's experiences. Meanwhile, successor features and generalized policy improvement (SF&GPI) is a meta and transfer learning framework allowing to learn policies for tasks that can be efficiently reused for later tasks which have a different reward function. Individually, these two techniques have shown impressive results in vastly improving sample efficiency and the elegant reuse of previously learned policies. Thus, we outline a combination of both approaches in a single reinforcement learning framework and empirically illustrate its benefits.

show abstract

“…Another direction is the generalization of the ψ-function over policies analogous to universal value function approximation (Schaul et al, 2015). Similar approaches use successor maps (Madarasz, 2019), goal-conditioned policies (Ma et al, 2020), or successor feature sets (Brantley et al, 2021). Other directions include their application to POMDPs (Vértes and Sahani, 2019), combination with max-entropy principles (Vertes, 2020), or hierarchical RL (Barreto et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

Successor Feature Representations

Reinke¹,

Alameda-Pineda²

2021

Preprint

View full text Add to dashboard Cite

Transfer in Reinforcement Learning aims to improve learning performance on target tasks using knowledge from experienced source tasks. Successor features (SF) are a prominent transfer mechanism in domains where the reward function changes between tasks. They reevaluate the expected return of previously learned policies in a new target task and to transfer their knowledge. A limiting factor of the SF framework is its assumption that rewards linearly decompose into successor features and a reward weight vector. We propose a novel SF mechanism, ξlearning, based on learning the cumulative discounted probability of successor features. Crucially, ξ-learning allows to reevaluate the expected return of policies for general reward functions. We introduce two ξ-learning variations, prove its convergence, and provide a guarantee on its transfer performance. Experimental evaluations based on ξ-learning with function approximation demonstrate the prominent advantage of ξ-learning over available mechanisms not only for general reward functions, but also in the case of linearly decomposable reward functions.

show abstract

Successor Feature Sets: Generalizing Successor Representations Across Policies

Cited by 3 publications

References 0 publications

A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions

A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions

Successor Feature Neural Episodic Control

Successor Feature Representations

Contact Info

Product

Resources

About