Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views

Su, Hao; Qi, Charles R.; Li, Yangyan; Guibas, Leonidas J.

doi:10.1109/iccv.2015.308

Cited by 619 publications

(697 citation statements)

References 33 publications

(51 reference statements)

Supporting

Mentioning

674

Contrasting

Unclassified

Order By: Relevance

“…Methods in the first category, such as [21] and [13], predict 2D keypoints from an image and then use 3D object models to predict the 3D pose given these keypoints. Methods in the second category, such as Viewpoints and Keypoints (V&K) [20] and Render-for-CNN [17], which are closer to what we do, predict 3D pose directly given an image. Both of these methods discretize the pose space into bins and solve a pose classification problem.…”

Section: Introductionmentioning

confidence: 73%

“…They have a similar network architecture, which is shared across object categories up to the second-last layer and a separate output layer for every category. While V&K [20] uses a standard cross-entropy loss for classification, Render-for-CNN [17] uses a weighted cross-entropy loss that respects the circular symmetry of angles. While V&K [20] uses jittered bounding boxes with sufficient overlap to augmented annotated training data, Render-for-CNN [17] uses rendered images with a well-sampled distribution over pose space, random crops, and backgrounds.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

3D Pose Regression Using Convolutional Neural Networks

Mahendran

Ali

Vidal

2017

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

View full text Add to dashboard Cite

Abstract3D pose estimation is a key component of many important computer vision tasks such as autonomous navigation and 3D scene understanding. Most state-of-the-art approaches to 3D pose estimation solve this problem as a pose-classification problem in which the pose space is discretized into bins and a CNN classifier is used to predict a pose bin. We argue that the 3D pose space is continuous and propose to solve the pose estimation problem in a CNN regression framework with a suitable representation, data augmentation and loss function that captures the geometry of the pose space. Experiments on PASCAL3D+ show that the proposed 3D pose regression approach achieves competitive performance compared to the state-of-the-art.

show abstract

Section: Introductionmentioning

confidence: 73%

Section: Introductionmentioning

confidence: 99%

3D Pose Regression Using Convolutional Neural Networks

Mahendran

Ali

Vidal

2017

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

View full text Add to dashboard Cite

show abstract

“…It is shown that synthetic data is beneficial, especially in situations where few (or no) training instances are available, but 3D CAD models are. Su et al [33] follow a similar pipeline of rendering images from 3D models for viewpoint estimation, however, with substantially more synthetic data obtained, e.g., by deforming existing 3D models before rendering.…”

Section: Related Workmentioning

confidence: 99%

AGA: Attribute-Guided Augmentation

Dixit

Kwitt

Niethammer

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

We consider the problem of data augmentation, i.e., generating artificial samples to extend a given corpus of training data. Specifically, we propose attributed-guided augmentation (AGA) which learns a mapping that allows synthesis of data such that an attribute of a synthesized sample is at a desired value or strength. This is particularly interesting in situations where little data with no attribute annotation is available for learning, but we have access to an external corpus of heavily annotated samples. While prior works primarily augment in the space of images, we propose to perform augmentation in feature space instead. We implement our approach as a deep encoder-decoder architecture that learns the synthesis function in an end-to-end manner. We demonstrate the utility of our approach on the problems of (1) one-shot object recognition in a transferlearning setting where we have no prior knowledge of the new classes, as well as (2) object-based one-shot scene recognition. As external data, we leverage 3D depth and pose information from the SUN RGB-D dataset. Our experiments show that attribute-guided augmentation of highlevel CNN features considerably improves one-shot recognition performance on both problems.

show abstract

“…It remains open, how well synthesized images can be used as a proxy of real training data without sacrificing performance. To bridge the gap between synthetic renderings and realistic images, Su et al (2015) propose a pipeline for rendering 3D objects in common poses onto realistic backgrounds. Based on this, Massa et al (2016) learn a mapping from CNN features computed on a realistic photo to features from a rendering, both showing the same object in the same pose, thus improving matching.…”

Section: Related Workmentioning

confidence: 99%

Automatic Registration of Images to Untextured Geometry Using Average Shading Gradients

Plötz

Roth

2017

Int J Comput Vis

View full text Add to dashboard Cite

Many existing approaches for image-to-geometry registration assume that either a textured 3D model or a good initial guess of the 3D pose is available to bootstrap the registration process. In this paper we consider the registration of photographs to 3D models even when no texture information is available. This is very challenging as we cannot rely on texture gradients, and even shading gradients are hard to estimate since the lighting conditions are unknown. To that end, we propose average shading gradients, a rendering technique that estimates the average gradient magnitude over all lighting directions under Lambertian shading. We use this gradient representation as the building block of a registration pipeline based on matching sparse features. To cope with inevitable false matches due to the missing texture information and to increase robustness, the pose of the 3D model is estimated in two stages. Coarse pose hypotheses are first obtained from a single correct match each, subsequently refined using SIFT flow, and finally verified. We apply our algorithm to registering images of real-world objects to untextured 3D meshes of limited accuracy. Moreover, we show that registration can be performed even for paintings despite lacking photo-realism.

show abstract

Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views

Cited by 619 publications

References 33 publications

3D Pose Regression Using Convolutional Neural Networks

3D Pose Regression Using Convolutional Neural Networks

AGA: Attribute-Guided Augmentation

Automatic Registration of Images to Untextured Geometry Using Average Shading Gradients

Contact Info

Product

Resources

About