We address the problem of 3D rotation equivariance in convolutional neural networks. 3D rotations have been a challenging nuisance in 3D classification tasks requiring higher capacity and extended data augmentation in order to tackle it. We model 3D data with multivalued spherical functions and we propose a novel spherical convolutional network that implements exact convolutions on the sphere by realizing them in the spherical harmonic domain. Resulting filters have local symmetry and are localized by enforcing smooth spectra. We apply a novel pooling on the spectral domain and our operations are independent of the underlying spherical resolution throughout the network. We show that networks with much lower capacity and without requiring data augmentation can exhibit performance comparable to the state of the art in standard retrieval and classification benchmarks.
Abstract. Automatically assigning keywords to images is of great interest as it allows one to index, retrieve, and understand large collections of image data. Many techniques have been proposed for image annotation in the last decade that give reasonable performance on standard datasets. However, most of these works fail to compare their methods with simple baseline techniques to justify the need for complex models and subsequent training. In this work, we introduce a new baseline technique for image annotation that treats annotation as a retrieval problem. The proposed technique utilizes low-level image features and a simple combination of basic distances to find nearest neighbors of a given image. The keywords are then assigned using a greedy label transfer mechanism. The proposed baseline outperforms the current state-of-the-art methods on two standard and one large Web dataset. We believe that such a baseline measure will provide a strong platform to compare and better understand future annotation techniques.
We propose a novel technique for the registration of 3D point clouds which makes very few assumptions: we avoid any manual rough alignment or the use of landmarks, displacement can be arbitrarily large, and the two point sets can have very little overlap. Crude alignment is achieved by estimation of the 3D-rotation from two Extended Gaussian Images even when the data sets inducing them have partial overlap. The technique is based on the correlation of the two EGIs in the Fourier domain and makes use of the spherical and rotational harmonic transforms. For pairs with low overlap which fail a critical verification step, the rotational alignment can be obtained by the alignment of constellation images generated from the EGIs. Rotationally aligned sets are matched by correlation using the Fourier transform of volumetric functions. A fine alignment is acquired in the final step by running Iterative Closest Points with just few iterations. Disciplines Computer Engineering | Engineering
Abstract. Automatically assigning keywords to images is of great interest as it allows one to index, retrieve, and understand large collections of image data. Many techniques have been proposed for image annotation in the last decade that give reasonable performance on standard datasets. However, most of these works fail to compare their methods with simple baseline techniques to justify the need for complex models and subsequent training. In this work, we introduce a new baseline technique for image annotation that treats annotation as a retrieval problem. The proposed technique utilizes low-level image features and a simple combination of basic distances to find nearest neighbors of a given image. The keywords are then assigned using a greedy label transfer mechanism. The proposed baseline outperforms the current state-of-the-art methods on two standard and one large Web dataset. We believe that such a baseline measure will provide a strong platform to compare and better understand future annotation techniques.
The availability of affordable and portable depth sensors has made scanning objects and people simpler than ever. However, dealing with occlusions and missing parts is still a significant challenge. The problem of reconstructing a (possibly non-rigidly moving) 3D object from a single or multiple partial scans has received increasing attention in recent years. In this work, we propose a novel learningbased method for the completion of partial shapes. Unlike the majority of existing approaches, our method focuses on objects that can undergo non-rigid deformations. The core of our method is a variational autoencoder with graph convolutional operations that learns a latent space for complete realistic shapes. At inference, we optimize to find the representation in this latent space that best fits the generated shape to the known partial input. The completed shape exhibits a realistic appearance on the unknown part. We show promising results towards the completion of synthetic and real scans of human body and face meshes exhibiting different styles of articulation and partiality.
This paper addresses the problem of rotation estimation directly from images defined on the sphere and without correspondence. The method is particularly useful for the alignment of large rotations and has potential impact on 3D shape alignment. The foundation of the method lies in the fact that the spherical harmonic coefficients undergo a unitary mapping when the original image is rotated. The correlation between two images is a function of rotations and we show that it has an SO(3)-Fourier transform equal to the pointwise product of spherical harmonic coefficients of the original images. The resolution of the rotation space depends on the bandwidth we choose for the harmonic expansion and the rotation estimate is found through a direct search in this 3D discretized space. A refinement of the rotation estimate can be obtained from the conservation of harmonic coefficients in the rotational shift theorem. A novel decoupling of the shift theorem with respect to the Euler angles is presented and exploited in an iterative scheme to refine the initial rotation estimates. Experiments show the suitability of the method for large rotations and the dependence of the method on bandwidth and the choice of the spherical harmonic coefficients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.