Learning Actionable Representations from Visual Observations

Dwibedi, Debidatta; Tompson, Jonathan; Lynch, Corey; Sermanet, Pierre

doi:10.48550/arxiv.1808.00928

Cited by 2 publications

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Self-supervised learning from sequences. Previous work in contrastive learning for sequential data often leverages a slowness assumption to use nearby samples as positive examples and farther samples as negative examples (Oord et al, 2018;Sermanet et al, 2018;Dwibedi et al, 2019;Le-Khac et al, 2020;Banville et al, 2020). Contrastive predictive coding (CPC) (Oord et al, 2018) builds upon the idea of temporal contrastive learning by building an AR-model that predicts future points given previous observed timesteps.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction

Azabou¹,

Azar²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different augmented "views" of a sample. Because these approaches try to match views of the same sample, they can be too myopic and fail to produce meaningful results when augmentations are not sufficiently rich. This motivates the use of the dataset itself to find similar, yet distinct, samples to serve as views for one another. In this paper, we introduce Mine Your Own vieW (MYOW), a new approach for building across-sample prediction into SSL. The idea behind our approach is to actively mine views, finding samples that are close in the representation space of the network, and then predict, from one sample's latent representation, the representation of a nearby sample. In addition to showing the promise of MYOW on standard datasets used in computer vision, we highlight the power of this idea in a novel application in neuroscience where rich augmentations are not already established. When applied to neural datasets, MYOW outperforms other self-supervised approaches in all examples (in some cases by more than 10%), and surpasses the supervised baseline for most datasets. By learning to predict the latent representation of similar samples, we show that it is possible to learn good representations in new domains where augmentations are still limited.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Temporal shift: As in previous work in temporal contrastive learning (Oord et al, 2018;Sermanet et al, 2018;Dwibedi et al, 2019;Le-Khac et al, 2020;Banville et al, 2020), we can use nearby samples as positive examples for one another.…”

Section: B3 Augmentations For Neural Datamentioning

confidence: 99%

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction

Azabou¹,

Azar²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…A third way we exploit viewpoint changes is for multiple-view self-supervised representation learning. The ability to observe different views of an object or a scene has been used in prior work ( [21], [23], [26], [27]) to learn low-dimensional state representations without human annotation. Efficient encoding of object and scene properties from high-dimensional images is essential for vision-based manipulation; we utilize Generative Query Networks [27] for this purpose.…”

Section: Introductionmentioning

confidence: 99%

Active Perception and Representation for Robotic Manipulation

Zaky¹,

Paruthi²,

Tripp³

et al. 2020

Preprint

View full text Add to dashboard Cite

The vast majority of visual animals actively control their eyes, heads, and/or bodies to direct their gaze toward different parts of their environment [3]. In contrast, recent applications of reinforcement learning in robotic manipulation employ cameras as passive sensors. These are carefully placed to view a scene from a fixed pose. Active perception allows animals to gather the most relevant information about the world and focus their computational resources where needed. It also enables them to view objects from different distances and viewpoints, providing a rich visual experience from which to learn abstract representations of the environment. Inspired by the primate visual-motor system, we present a framework that leverages the benefits of active perception to accomplish manipulation tasks. Our agent uses viewpoint changes to localize objects, to learn state representations in a self-supervised manner, and to perform goal-directed actions. We apply our model to a simulated grasping task with a 6-DoF action space. Compared to its passive, fixed-camera counterpart, the active model achieves 8% better performance in targeted grasping. Compared to vanilla deep Q-learning algorithms [44], our model is at least four times more sample-efficient, highlighting the benefits of both active perception and representation learning.

show abstract

Learning Actionable Representations from Visual Observations

Cited by 2 publications

References 0 publications

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction

Active Perception and Representation for Robotic Manipulation

Contact Info

Product

Resources

About