2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018
DOI: 10.1109/iros.2018.8593951
|View full text |Cite
|
Sign up to set email alerts
|

Learning Actionable Representations from Visual Observations

Abstract: In this work we explore a new approach for robots to teach themselves about the world simply by observing it. In particular we investigate the effectiveness of learning task-agnostic representations for continuous control tasks. We extend Time-Contrastive Networks (TCN) that learn from visual observations by embedding multiple frames jointly in the embedding space as opposed to a single frame. We show that by doing so, we are now able to encode both position and velocity attributes significantly more accuratel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 54 publications
(31 citation statements)
references
References 35 publications
0
30
0
Order By: Relevance
“…Sermanet et al (2017) used metric learning loss for getting a temporally coherent viewpoint invariant embedding space with multi-view or single-view images and used nearest neighbors in the embedding space for imitation learning. Dwibedi et al (2018) extended the approach to multiple frames, and use a temporal cycle consistency (TCC) loss by matching frames over time across videos in Dwibedi et al (2019). Finn and Levine (2016) and Ebert et al (2018) presented a deep action-conditioned visual foresight model with model-predictive control for learning from pixels directly.…”
Section: Related Workmentioning
confidence: 99%
“…Sermanet et al (2017) used metric learning loss for getting a temporally coherent viewpoint invariant embedding space with multi-view or single-view images and used nearest neighbors in the embedding space for imitation learning. Dwibedi et al (2018) extended the approach to multiple frames, and use a temporal cycle consistency (TCC) loss by matching frames over time across videos in Dwibedi et al (2019). Finn and Levine (2016) and Ebert et al (2018) presented a deep action-conditioned visual foresight model with model-predictive control for learning from pixels directly.…”
Section: Related Workmentioning
confidence: 99%
“…The sample-efficiency gains from reconstruction-based auxiliary losses have been benchmarked in [3,29,37]. Recently, contrastive learning has been used to extract reward signals in the latent space [38][39][40]; and study representation learning on Atari games [41].…”
Section: Auxiliary Tasksmentioning
confidence: 99%
“…In RL, self-supervision also gained momentum in recent years [21,43,48], with temporal information being featured [1]. Notably, several works [3,12,20,42] leverage temporal consistency to learn useful representations, effectively learning to discriminate between observations that are temporally close and observations that are temporally distant. In comparison to all these works, we estimate the arrow of time through temporal order prediction with the explicit goal of finding irreversible transitions or actions.…”
Section: Related Workmentioning
confidence: 99%