2022
DOI: 10.48550/arxiv.2202.08417
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Retrieval-Augmented Reinforcement Learning

Abstract: Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we explore an alternative… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 39 publications
0
7
0
Order By: Relevance
“…We randomly zero-out a subset of retrieved neighbors during training ("neighbor dropout"), and/or more adversarially, randomly replace a subset of retrieved neighbors with the neighbors of a different observation ("neighbor randomisation"). Inspired by [10], we also explore using a loss to regularise the embedding produced by the neighbor retrieval towards the embedding produced with the observation alone ("neighbor regularisation"). Further details are given in Sec.…”
Section: Regularisationmentioning
confidence: 99%
See 3 more Smart Citations
“…We randomly zero-out a subset of retrieved neighbors during training ("neighbor dropout"), and/or more adversarially, randomly replace a subset of retrieved neighbors with the neighbors of a different observation ("neighbor randomisation"). Inspired by [10], we also explore using a loss to regularise the embedding produced by the neighbor retrieval towards the embedding produced with the observation alone ("neighbor regularisation"). Further details are given in Sec.…”
Section: Regularisationmentioning
confidence: 99%
“…through specifying the agent's action-value directly in terms of previously generated value estimates [6,13,15,28], or a model from observed transitions [38]) but rather learn end-to-end how the data can support better predictions within the parametric model. A recent approach by Goyal et al [10] has considered an attention mechanism to select where and what to use from available trajectories, but over a small retrieval batch of data rather than the full available experience data. Another class of method to leverage a transition dataset is to replay the data at training time in order to perform more gradient steps per experience, this is a widespread technique in modern RL algorithms [21,22,24,35] but it does not benefit the agent at test time, requires additional learning steps to adapt to new data, and does not allow end-to-end learning of how to relate past experience to new situations.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The hallmark of machine learning is to be able to develop models that can quickly adapt to new tasks once trained on sufficiently diverse tasks [5,54]. There are multiple ways to transfer information from one task to another: (1) transfer information via the transfer of the neural network weights (when trained on source tasks); (2) reuse raw data as in retrieval-based methods [6,37,32,33,19,51,17]; or (3), via knowledge distillation [23]. Each approach implies inevitable trade-offs: When directly transferring neural network weights, previous information about the data may be lost in the finetuning process, while transfer via raw data may be prohibitively expensive as there can be hundreds of thousands of past experiences.…”
Section: Related Workmentioning
confidence: 99%