2020
DOI: 10.48550/arxiv.2009.08319
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Decoupling Representation Learning from Reinforcement Learning

Abstract: In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the en… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 18 publications
(25 citation statements)
references
References 16 publications
(28 reference statements)
0
25
0
Order By: Relevance
“…This is primarily because XGBoost being an "instructive" process, has access to complete data during training which allows it to learn a better representation of the data compared to a DRL agent trained in an episodic manner. These problems can be potentially be resolved by handling the distribution shift in offline reinforcement learning [19], using a better curriculum strategy [30] or by solving for the representation learning problem [41].…”
Section: Discussionmentioning
confidence: 99%
“…This is primarily because XGBoost being an "instructive" process, has access to complete data during training which allows it to learn a better representation of the data compared to a DRL agent trained in an episodic manner. These problems can be potentially be resolved by handling the distribution shift in offline reinforcement learning [19], using a better curriculum strategy [30] or by solving for the representation learning problem [41].…”
Section: Discussionmentioning
confidence: 99%
“…Namely, Laskin et al [22] use data augmentations as positive samples and all other samples in the batch, as well as their augmentations as the negative ones. Similarly, the work in [34] uses contrastive learning to associate pairs of observations separated by a short time difference, hence uses (near) future observations as positive queries and all other samples in the batch as negative ones. Both AE-based methods and contrastive learning focus on compression of observation as the main goal for SRL.…”
Section: Srl Modelmentioning
confidence: 99%
“…Namely we use RAE and contrastive methods. Although contrastive learning has shown superior results to AE-based approaches [22], [34], these methods have many advantages. They are simple to implement, allow for integrating self-supervised objectives such as jigsaw puzzle [25], enable multi-modal and multi-view fusion [1], [23], as well as task-specific objectives such as contact prediction [23].…”
Section: Srl Modelmentioning
confidence: 99%
“…Partly in response to these and related shortcomings, some of the AI community has suggested that it may be desirable to decouple feature importance from representation learning [97,92,106]. For scientific inquiry, however, this decoupling is only useful if the result is human-comprehensible and interpretable (as defined herein).…”
Section: Explainability Versus Interpretabilitymentioning
confidence: 99%