Self-Supervised Visual Descriptor Learning for Dense Correspondence

Schmidt, Tanner; Newcombe, Richard; Fox, Dieter

doi:10.1109/lra.2016.2634089

Cited by 151 publications

(120 citation statements)

References 24 publications

Supporting

Mentioning

118

Contrasting

Order By: Relevance

“…Our work is broadly related to methods that learn pixel embeddings invariant to certain transforms. These approaches leverage tracking to obtain correspondence labels, and learn representations invariant to viewpoint transformation [36,51] or motion [46]. Similar to self-supervised correspondence approaches, these are also limited to training using observations of the same instance, and do not generalize well across instances.…”

Section: Related Workmentioning

confidence: 99%

Canonical Surface Mapping via Geometric Cycle Consistency

Kulkarni

Tulsiani

Gupta

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

106

135

View full text Add to dashboard Cite

Figure 1: We study the task of Canonical Surface Mapping (CSM). This task is a generalization of keypoint estimation and involves mapping pixels to canonical 3D models. We learn CSM prediction without requiring correspondence annotations, by instead using geometric cycle consistency as supervision. This allows us to train CSM prediction for diverse classes, including rigid and non-rigid objects. AbstractWe explore the task of Canonical Surface Mapping (CSM). Specifically, given an image, we learn to map pixels on the object to their corresponding locations on an abstract 3D model of the category. But how do we learn such a mapping? A supervised approach would require extensive manual labeling which is not scalable beyond a few hand-picked categories. Our key insight is that the CSM task (pixel to 3D), when combined with 3D projection (3D to pixel), completes a cycle. Hence, we can exploit a geometric cycle consistency loss, thereby allowing us to forgo the dense manual supervision. Our approach allows us to train a CSM model for a diverse set of classes, without sparse or dense keypoint annotation, by leveraging only foreground mask labels for training. We show that our predictions also allow us to infer dense correspondence between two images, and compare the performance of our approach against several methods that predict correspondence by leveraging varying amount of supervision.* the last two authors were equally uninvolved.

show abstract

Section: Related Workmentioning

confidence: 99%

Canonical Surface Mapping via Geometric Cycle Consistency

Kulkarni

Tulsiani

Gupta

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

106

135

View full text Add to dashboard Cite

show abstract

“…Unlike in previous work which trained robotic-supervised correspondence models only for static environments [7], we now would like to train correspondence models with dynamic environments. Other prior work [6] has used dynamic non-rigid reconstruction [35] to address dynamic scenes. The approach we demonstrate here instead is to correspond pixels between two camera views with images that are approximately synchronized in time, similar to the full-image-embedding training in [17], but here for pixel-to-pixel correspondence.…”

Section: Multi-view Time-synchronized Correspondence Trainingmentioning

confidence: 99%

Self-Supervised Correspondence in Visuomotor Policy Learning

Florence

Manuelli

Tedrake

2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

In this paper we explore using self-supervised correspondence for improving the generalization performance and sample efficiency of visuomotor policy learning. Prior work has primarily used approaches such as autoencoding, pose-based losses, and end-to-end policy optimization in order to train the visual portion of visuomotor policies. We instead propose an approach using self-supervised dense visual correspondence training, and show this enables visuomotor policy learning with surprisingly high generalization performance with modest amounts of data: using imitation learning, we demonstrate extensive hardware validation on challenging manipulation tasks with as few as 50 demonstrations. Our learned policies can generalize across classes of objects, react to deformable object configurations, and manipulate textureless symmetrical objects in a variety of backgrounds, all with closedloop, real-time vision-based policies. Simulated imitation learning experiments suggest that correspondence training offers sample complexity and generalization benefits compared to autoencoding and end-to-end training.

show abstract

“…We are, of course, not the first to learn dense representations on visual data. Most prior work on this topic revolve around learning correspondences across views in 2D [6,21] and 3D [31,23,22,3]. Florence et al [12] proposed dense object nets, learning dense descriptors by multi-view reconstruction and applying the descriptors to manipulation tasks.…”

Section: Related Workmentioning

confidence: 99%

DensePhysNet: Learning Dense Physical Object Representations Via Multi-Step Dynamic Interactions

Zeng

et al. 2019

Robotics: Science and Systems XV

View full text Add to dashboard Cite

We study the problem of learning physical object representations for robot manipulation. Understanding object physics is critical for successful object manipulation, but also challenging because physical object properties can rarely be inferred from the object's static appearance. In this paper, we propose DensePhysNet, a system that actively executes a sequence of dynamic interactions (e.g., sliding and colliding), and uses a deep predictive model over its visual observations to learn dense, pixel-wise representations that reflect the physical properties of observed objects. Our experiments in both simulation and real settings demonstrate that the learned representations carry rich physical information, and can directly be used to decode physical object properties such as friction and mass. The use of dense representation enables DensePhysNet to generalize well to novel scenes with more objects than in training. With knowledge of object physics, the learned representation also leads to more accurate and efficient manipulation in downstream tasks than the state-of-the-art. Video is available at http://zhenjiaxu.com/ DensePhysNet

show abstract

Self-Supervised Visual Descriptor Learning for Dense Correspondence

Cited by 151 publications

References 24 publications

Canonical Surface Mapping via Geometric Cycle Consistency

Canonical Surface Mapping via Geometric Cycle Consistency

Self-Supervised Correspondence in Visuomotor Policy Learning

DensePhysNet: Learning Dense Physical Object Representations Via Multi-Step Dynamic Interactions

Contact Info

Product

Resources

About