2021
DOI: 10.48550/arxiv.2101.05265
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Abstract: Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 18 publications
(41 citation statements)
references
References 24 publications
0
28
0
Order By: Relevance
“…Contrastive learning (Srinivas et al, 2020;Chen et al, 2020;Hjelm et al, 2018) allows the algorithm designer to specify positive and negative matches in representation space, and embed a similarity measure based on this by maximizing agreement between positive matches, and minimizing it w.r.t negative matches; contrastive learning has been shown to improve sample efficiency. Observation prediction and reconstruction (Hafner et al, 2019;Sekar et al, 2020) also provides a rich auxiliary training signal, but forces the agent to model and reconstruct task-irrelevant distractors, which may be a significant disadvantage on natural scenarios (Zhang et al, 2020;Agarwal et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Contrastive learning (Srinivas et al, 2020;Chen et al, 2020;Hjelm et al, 2018) allows the algorithm designer to specify positive and negative matches in representation space, and embed a similarity measure based on this by maximizing agreement between positive matches, and minimizing it w.r.t negative matches; contrastive learning has been shown to improve sample efficiency. Observation prediction and reconstruction (Hafner et al, 2019;Sekar et al, 2020) also provides a rich auxiliary training signal, but forces the agent to model and reconstruct task-irrelevant distractors, which may be a significant disadvantage on natural scenarios (Zhang et al, 2020;Agarwal et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…The works of (Agarwal et al, 2021;Zhang et al, 2020) (DBC, PSM) tackle the issue of approximating policy bisimulation (or related notions) over partially observable Markov decision processes (POMDP). Both approximations unfortunately suffer from drawbacks where their proposed bisimulation optimization objectives have nontrivial biases for estimating Wasserstein distances, and provably lose important metric properties like state self similarity.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations