Factorization network Te s t Train ϕ 3D shape and viewpoint (α, θ ) (Y, v) 2D keypoints and visibility Dense keypoints Non-rigid objects Rigid objects Monocular reconstruction of:ϕ Figure 1: Our method learns a 3D model of a deformable object category from 2D keypoints in unconstrained images. It comprises a deep network that learns to factorize shape and viewpoint and, at test time, performs monocular reconstruction. AbstractWe propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images. We do so by learning a deep network that reconstructs a 3D object from a single view at a time, accounting for partial occlusions, and explicitly factoring the effects of viewpoint changes and object deformations. In order to achieve this factorization, we introduce a novel regularization technique. We first show that the factorization is successful if, and only if, there exists a certain canonicalization function of the reconstructed shapes. Then, we learn the canonicalization function together with the reconstruction one, which constrains the result to be consistent. We demonstrate stateof-the-art reconstruction results for methods that do not use ground-truth 3D supervision for a number of benchmarks, including Up3D and PASCAL3D+. Source code has been made available at https
Convolutional neural networks (CNNs) are powerful tools for understanding data with spatial structure such as photos. They are most commonly used in two dimensions, but they can also be applied more generally. One-dimensional CNNs are used for processing time-series such as human speech. Three dimensional CNNs have been used to analyze movement in 2+1 dimensional space-time [2] and for helping drones find a safe place to land [3]. Three dimensional convolutional deep belief networks have been used to recognize objects in 2.5D depth maps [4].In [1], a sparse two-dimensional CNN is implemented to perform Chinese handwriting recognition. When a handwritten character is rendered at moderately high resolution on a two dimensional grid, it looks like a sparse matrix. If we only calculate the hidden units of the CNN that can actually see some part of the input field the pen has visited, the workload decreases.Sparsity is a useful optimization in two dimensions, and it is potentially even more useful in three or higher dimensions. This is related to the curse of dimensionality; an N × N × N cubic grid contains many more points than an N × N square grid. We have adapted the algorithm from [1] to implement sparse CNNs on range of different graphs. CUDA GPU code for running sparse 2, 3 and 4 dimensional CNNs is available at:https://github.com/btgraham/SparseConvNet The world we live in is three dimensional, and time can also be thought of as an extra dimension, so there are a large number of possible applications for three and even four dimensional CNNs. Figure 1 shows what happens to sparse 3D data as it passes though a CNN. In this paper I apply CNNs to a variety of sparse 3D datasets.When applying CNNs to sparse data, it may be better to use small convolutional filters, as they do a better job of preserving sparsity in the computationally expensive lower layers of the network. To reduce the size of the filters, we have experimented with changing the underlying graph. See Figure 2. Figure 3 shows a variety of objects from the SHREC2015 Non-rigid 3D Shape Retrieval dataset, each stored as a mesh of triangles in the OFFfile format. The dataset contains 1200 exemplars split evenly between 50 classes (aliens, ants, armadillo, ...). The dataset was intended to be used for unsupervised learning, but as CNNs are most often used for supervised learning, we used 6-fold cross-validation to measure the ability of our 3D CNNs to learn shapes. To stop the dataset being too easy, we randomly rotated the objects during training and testing. This is to force the CNN to truly learn to recognize shape, and not rely on some classes of objects tending to have a certain orientation. We tested a variety of network architectures to explore the trade-off between speed and accuracy. Figure 4 shows an image from the Recognizing Human Actions video dataset. Taking differences between successive frame converts the dataset to a collection of sparse 2+1 dimensional objects. We also experimented with the more complicated UCF101 video dataset.[1] Ben G...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.