Multi-view Convolutional Neural Networks for 3D Shape Recognition

Su, Hang; Maji, Subhransu; Kalogerakis, Evangelos; Learned-Miller, Erik

doi:10.1109/iccv.2015.114

Cited by 2,477 publications

(2,083 citation statements)

References 34 publications

Supporting

Mentioning

2,064

Contrasting

Unclassified

Order By: Relevance

“…In general, shape structures are defined by the arrangement of, and relations between, shape parts . Developing neural nets for structured shape representations requires a significant departure from existing works on convolutional neural networks (CNNs) for volumetric [Wu et al 2015;Girdhar et al 2016;Yumer and Mitra 2016;Wu et al 2016] or view-based [Su et al 2015;Qi et al 2016;Sinha et al 2016] shape representations. These works primarily adapt classical CNN architectures for image analysis.…”

Section: Introductionmentioning

confidence: 99%

Deformation-driven shape correspondence via shape recognition

Zhu

Lira

et al. 2017

ACM Trans. Graph.

View full text Add to dashboard Cite

Figure 1: We develop GRASS, a Generative Recursive Autoencoder for Shape Structures, which enables structural blending between two 3D shapes. Note the discrete blending of translational symmetries (slats on the chair backs) and rotational symmetries (the swivel legs). GRASS encodes and synthesizes box structures (bottom) and part geometries (top) separately. The blending is performed on fixed-length codes learned by the unsupervised autoencoder, without any form of part correspondences, given or computed. AbstractWe introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures. Our key insight is that 3D shapes are effectively characterized by their hierarchical organization of parts, which reflects fundamental intra-shape relationships such as adjacency and symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a flat, unlabeled, arbitrary part layout to a compact code. The code effectively captures hierarchical structures of man-made 3D objects of varying structural complexities despite being fixed-dimensional: an associated decoder maps a code back to a full hierarchy. The learned bidirectional mapping is further tuned using an adversarial setup to yield a generative model of plausible structures, from which novel structures can be sampled. Finally, our structure synthesis framework is augmented by a second trained module that produces fine-grained part geometry, conditioned on global and local structural context, leading to a full generative pipeline for 3D shapes. We demonstrate that without supervision, our network learns meaningful structural hierarchies adhering to perceptual grouping principles, produces compact codes which enable applications such as shape classification and partial matching, and supports shape synthesis and interpolation with significant variations in topology and geometry.

show abstract

Section: Introductionmentioning

confidence: 99%

Deformation-driven shape correspondence via shape recognition

Zhu

Lira

et al. 2017

ACM Trans. Graph.

View full text Add to dashboard Cite

show abstract

“…Other approaches that are based on the ideas similar to the one presented in this paper are rolling feature maps 1 and multi-view networks [26]. The former explores a pooling over a set of transformations, but does not guarantee the transformation-invariance of the features learned.…”

Section: Multiple Instance Learningmentioning

confidence: 99%

TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks

Laptev

Savinov

Buhmann

et al. 2016

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

227

201

View full text Add to dashboard Cite

In this paper we present a deep neural network topology that incorporates a simple to implement transformationinvariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number of model parameters and more training data, and results in significantly increased training time and larger chance of under-or overfitting. The main reason for these drawbacks is that that the learned model needs to capture adequate features for all the possible transformations of the input. On the other hand, we formulate features in convolutional neural networks to be transformation-invariant. We achieve that using parallel siamese architectures for the considered transformation set and applying the TI-POOLING operator on their outputs before the fully-connected layers. We show that this topology internally finds the most optimal "canonical" instance of the input image for training and therefore limits the redundancy in learned features. This more efficient use of training data results in better performance on popular benchmark datasets with smaller number of parameters when comparing to standard convolutional neural networks with dataset augmentation and to other baselines.

show abstract

“…The experimental results show that this method can preserve the shape information of 3D objects to a certain extent through transformation, but the transformation process itself changes the local and global structures of 3D shapes, resulting in the decrease of feature discrimination. Meanwhile, Su et al [2] proposed multi-view convolution network structure (Multi-View CNN, MVCNN) [13]. These authors use the multi-view 2D projection of the 3D object to extract a concise 3D feature descriptor for the classification and retrieval of 3D shapes.…”

Section: Related Workmentioning

confidence: 99%

“…Due to point clouds is not in a regular format, most researchers transform this data to regular 3D voxel grids or collections of images before sending them to a deep network architecture. The method of feature learning for 3D object recognition in depth learning can be roughly divided into three methods which including Mutiview based [1,2], volumetric representation based [3,4] and based on point cloud. The multi-view based method is to project the three-dimension shape into the twodimension image space, and then use the method of depth learning to extract the two-dimension image.…”

Section: Introductionmentioning

confidence: 99%

Deep Multi-level Feature Learning on Point Sets for 3D Object Recognition

Xiao¹,

Ma²,

Huang³

et al. 2018

dtcse

View full text Add to dashboard Cite

In recent years, deep learning has become an important method on point cloud for 3D object recognition. PointNet is the first neural network which could directly consume point cloud as input. However, the PointNet couldn't capture the local features. In this work, we introduce a multi-level feature extraction neural network which extracts the characteristics of the multi-level structure in PointNet. Experiments are conducted on the ModelNet40 dataset with several state-of-the-art methods. The proposed method achieves a higher accuracy on 3D object recognition with 89.4%. Experimental results have demonstrated the superior performance of the proposed multi-level feature learning network.

show abstract

Multi-view Convolutional Neural Networks for 3D Shape Recognition

Cited by 2,477 publications

References 34 publications

Deformation-driven shape correspondence via shape recognition

Deformation-driven shape correspondence via shape recognition

TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks

Deep Multi-level Feature Learning on Point Sets for 3D Object Recognition

Contact Info

Product

Resources

About