Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Agarwal, Rishabh; Machado, Marlos C.; Castro, Pablo Samuel; Bellemare, Marc G.

doi:10.48550/arxiv.2101.05265

Cited by 18 publications

(41 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Contrastive learning (Srinivas et al, 2020;Chen et al, 2020;Hjelm et al, 2018) allows the algorithm designer to specify positive and negative matches in representation space, and embed a similarity measure based on this by maximizing agreement between positive matches, and minimizing it w.r.t negative matches; contrastive learning has been shown to improve sample efficiency. Observation prediction and reconstruction (Hafner et al, 2019;Sekar et al, 2020) also provides a rich auxiliary training signal, but forces the agent to model and reconstruct task-irrelevant distractors, which may be a significant disadvantage on natural scenarios (Zhang et al, 2020;Agarwal et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

“…The works of (Agarwal et al, 2021;Zhang et al, 2020) (DBC, PSM) tackle the issue of approximating policy bisimulation (or related notions) over partially observable Markov decision processes (POMDP). Both approximations unfortunately suffer from drawbacks where their proposed bisimulation optimization objectives have nontrivial biases for estimating Wasserstein distances, and provably lose important metric properties like state self similarity.…”

Section: Related Workmentioning

confidence: 99%

“…Two recent approaches to embed π−bisimulation metrics are DBC (Zhang et al, 2020) and PSM (Agarwal et al, 2021). We first analyze these approaches before presenting our solution.…”

Section: Bisimulation Metricsmentioning

confidence: 99%

“…Lemma 4.7. Let E π be the largest π-bisimulation relation, and let E P SM likewise be the largest average-action bisimulation relation in (Agarwal et al, 2021). The entangled bisimulation relation E satisfies…”

Section: A Theorem Proofsmentioning

confidence: 99%

“…Correspondence to: Martin Bertran <martin.bertran@duke.edu>. lation metrics (Ferns et al, 2011;Ferns & Precup, 2014;Castro, 2020), where states that are indistinguishable with respect to future reward trajectories are grouped together (Zhang et al, 2020) (deep bisimulation for control, DBC), or states which produce the same action sequence under the optimal policy are similarly grouped (Agarwal et al, 2021) (policy similarity metrics, PSM). Bisimulation tackles generalization by incorporating and measuring important invariances of the policy and environment dynamics into the representation learning process.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation

Bertran¹,

Talbott²,

Srivastava³

et al. 2022

Preprint

View full text Add to dashboard Cite

Learning generalizeable policies from visual input in the presence of visual distractions is a challenging problem in reinforcement learning. Recently, there has been renewed interest in bisimulation metrics as a tool to address this issue; these metrics can be used to learn representations that are, in principle, invariant to irrelevant distractions by measuring behavioural similarity between states. An accurate, unbiased, and scalable estimation of these metrics has proved elusive in continuous state and action scenarios. We propose entangled bisimulation, a bisimulation metric that allows the specification of the distance function between states, and can be estimated without bias in continuous state and action spaces. We show how entangled bisimulation can meaningfully improve over previous methods on the Distracting Control Suite (DCS), even when added on top of data augmentation techniques.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

“…Two recent approaches to embed π−bisimulation metrics are DBC (Zhang et al, 2020) and PSM (Agarwal et al, 2021). We first analyze these approaches before presenting our solution.…”

Section: Bisimulation Metricsmentioning

confidence: 99%

Section: A Theorem Proofsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation

Bertran¹,

Talbott²,

Srivastava³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Graph learning-based generation of abstractions for reinforcement learning

Xue

Kudenko²,

Khosla³

2023

Neural Comput & Applic

View full text Add to dashboard Cite

The application of reinforcement learning (RL) algorithms is often hindered by the combinatorial explosion of the state space. Previous works have leveraged abstractions which condense large state spaces to find tractable solutions. However, they assumed that the abstractions are provided by a domain expert. In this work, we propose a new approach to automatically construct abstract Markov decision processes (AMDPs) for potential-based reward shaping to improve the sample efficiency of RL algorithms. Our approach to constructing abstract states is inspired by graph representation learning methods, it effectively encodes the topological and reward structure of the ground-level MDP. We perform large-scale quantitative experiments on a range of navigation and gathering tasks under both stationary and stochastic settings. Our approach shows improvements of up to 8.5 times in sample efficiency and up to 3 times in run time over the baseline approach. Besides, with our qualitative analyses of the generated AMDPs, we are able to visually demonstrate the capability of our approach to preserve the topological and reward structure of the ground-level MDP.

show abstract

An overview of violence detection techniques: current challenges and future directions

et al. 2022

View full text Add to dashboard Cite

The Big Video Data generated in today's smart cities has raised concerns from its purposeful usage perspective, where surveillance cameras, among many others are the most prominent resources to contribute to the huge volumes of data, making its automated analysis a difficult task in terms of computation and preciseness. Violence Detection (VD), broadly plunging under Action and Activity recognition domain, is used to analyze Big Video data for anomalous actions incurred due to humans. The VD literature is traditionally based on manually engineered features, though advancements to deep learning based standalone models are developed for

show abstract

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Cited by 18 publications

References 24 publications

Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation

Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation

Graph learning-based generation of abstractions for reinforcement learning

An overview of violence detection techniques: current challenges and future directions

Contact Info

Product

Resources

About