2021
DOI: 10.48550/arxiv.2111.09794
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

Abstract: The study of generalisation in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We provide a unifying formalism and terminology for discussing different… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
47
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(47 citation statements)
references
References 100 publications
0
47
0
Order By: Relevance
“…For instance, gradually training AVs with increasing risk level under curriculum learning [194] framework may help systems easily generalize to more types of safety-critical scenarios. One recent survey [195] that investigates the generalization problem in RL emphasizes the importance of environment generation in increasing the similarity between training and testing domains. This direction extends the scenario generation from safety to broader views that requires goalconditioned environment generation.…”
Section: What Are Future Directionsmentioning
confidence: 99%
“…For instance, gradually training AVs with increasing risk level under curriculum learning [194] framework may help systems easily generalize to more types of safety-critical scenarios. One recent survey [195] that investigates the generalization problem in RL emphasizes the importance of environment generation in increasing the similarity between training and testing domains. This direction extends the scenario generation from safety to broader views that requires goalconditioned environment generation.…”
Section: What Are Future Directionsmentioning
confidence: 99%
“…In order to distinguish different environments, we follow the ideas in Kirk et al (2021) and consider a Contextual Partially Observable Markov Game, where we introduce a set of contexts K. 5 For each context k ∈ K we have a Partially Observable Markov Game (POMG) with the property that the state of the game can be decomposed into two parts, s = (k, s ) ∈ S k , where s ∈ S is the state and k ∈ K is the context. Formally: Definition 2.2 (Contextual Partially Observable Markov Game (CPOMG)).…”
Section: A Is the Joint Action Spacementioning
confidence: 99%
“…The learners then engage in a "meta-learning" problem with awareness of the contexts. This is the background of the framework introduced in Kirk et al (2021). Similarly, we will consider a discrete set of contexts in our analysis here.…”
Section: A Is the Joint Action Spacementioning
confidence: 99%
See 2 more Smart Citations