Abstract. This paper deals with local 3D descriptors for surface matching. First, we categorize existing methods into two classes: Signatures and Histograms. Then, by discussion and experiments alike, we point out the key issues of uniqueness and repeatability of the local reference frame. Based on these observations, we formulate a novel comprehensive proposal for surface representation, which encompasses a new unique and repeatable local reference frame as well as a new 3D descriptor. The latter lays at the intersection between Signatures and Histograms, so as to possibly achieve a better balance between descriptiveness and robustness. Experiments on publicly available datasets as well as on range scans obtained with Spacetime Stereo provide a thorough validation of our proposal.
Deep convolutional neural networks trained end-to-end are the state-of-the-art methods to regress dense disparity maps from stereo pairs. These models, however, suffer from a notable decrease in accuracy when exposed to scenarios significantly different from the training set (e.g., real vs synthetic images, etc.). We argue that it is extremely unlikely to gather enough samples to achieve effective training/tuning in any target domain, thus making this setup impractical for many applications. Instead, we propose to perform unsupervised and continuous online adaptation of a deep stereo network, which allows for preserving its accuracy in any environment. However, this strategy is extremely computationally demanding and thus prevents real-time inference. We address this issue introducing a new lightweight, yet effective, deep stereo architecture, Modularly ADaptive Network (MADNet), and developing a Modular ADaptation (MAD) algorithm, which independently trains sub-portions of the network.By deploying MADNet together with MAD we introduce the first real-time self-adaptive deep stereo system enabling competitive performance on heterogeneous datasets.Our code is publicly available at https://github.com/CVLAB-Unibo/ Real-time-self-adaptive-deep-stereo.
Motivated by the increasing availability of 3D sensors capable of delivering both shape and texture information, this paper presents a novel descriptor for feature matching in 3D data enriched with texture. The proposed approach stems from the theory of a recently proposed descriptor for 3D data which relies on shape only, and represents its generalization to the case of multiple cues associated with a 3D mesh. The proposed descriptor, dubbed CSHOT, is demonstrated to notably improve the accuracy of feature matching in challenging object recognition scenarios characterized by the presence of clutter and occlusions.
Significant achievements have been attained in the field of dense stereo correspondence by local algorithms based on an adaptive support. Given the problem of matching two correspondent pixels within a local stereo process, the basic idea is to consider as support for each pixel only those points which lay on the same disparity plane, rather than those belonging to a fixed support. This paper proposes a novel support aggregation strategy which includes information obtained from a segmentation process. Experimental results on the Middlebury dataset demonstrate that our approach is effective in improving the state of the art.
Camera relocalisation is an important problem in computer vision, with applications in simultaneous localisation and mapping, virtual/augmented reality and navigation. Common techniques either match the current image against keyframes with known poses coming from a tracker, or establish 2D-to-3D correspondences between keypoints in the current image and points in the scene in order to estimate the camera pose. Recently, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but must be trained offline on the target scene, preventing relocalisation in new environments. In this paper, we show how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. Our adapted forests achieve relocalisation performance that is on par with that of offline forests, and our approach runs in under 150ms, making it desirable for realtime systems that require online relocalisation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.