We present a two-stage, geometry-aware approach for matching SIFT-like features in a fast and reliable manner. Our approach first uses a small sample of features to estimate the epipolar geometry between the images and leverages it for guided matching of the remaining features. This simple and generalized two-stage matching approach produces denser feature correspondences while allowing us to formulate an accelerated search strategy to gain significant speedup over the traditional matching. The traditional matching punitively rejects many true feature matches due to a global ratio test. The adverse effect of this is particularly visible when matching image pairs with repetitive structures. The geometry-aware approach prevents such preemptive rejection using a selective ratio-test and works effectively even on scenes with repetitive structures. We also show that the proposed algorithm is easy to parallelize and implement it on the GPU. We experimentally validate our algorithm on publicly available datasets and compare the results with state-of-the-art methods.
In this paper, we present a new multistage approach for SfM reconstruction of a single component. Our method begins with building a coarse 3D reconstruction using highscale features of given images. This step uses only a fraction of features and is fast. We enrich the model in stages by localizing remaining images to it and matching and triangulating remaining features. Unlike traditional incremental SfM, localization and triangulation steps in our approach are made efficient and embarrassingly parallel using geometry of the coarse model. The coarse model allows us to use 3D-2D correspondences based direct localization techniques to register remaining images. We further utilize the geometry of the coarse model to reduce the pair-wise image matching effort as well as to perform fast guided feature matching for majority of features. Our method produces similar quality models as compared to incremental SfM methods while being notably fast and parallel. Our algorithm can reconstruct a 1000 images dataset in 15 hours using a single core, in about 2 hours using 8 cores and in a few minutes by utilizing full parallelism of about 200 cores.
View-graph is an essential input to large-scale structure from motion (SfM) pipelines. Accuracy and efficiency of large-scale SfM is crucially dependent on the input viewgraph. Inconsistent or inaccurate edges can lead to inferior or wrong reconstruction. Most SfM methods remove 'undesirable' images and pairs using several, fixed heuristic criteria, and propose tailor-made solutions to achieve specific reconstruction objectives such as efficiency, accuracy, or disambiguation. In contrast to these disparate solutions, we propose a single optimization framework that can be used to achieve these different reconstruction objectives with task-specific cost modeling. We also construct a very efficient network-flow based formulation for its approximate solution. The abstraction brought on by this selection mechanism separates the challenges specific to datasets and reconstruction objectives from the standard SfM pipeline and improves its generalization. This paper demonstrates the application of the proposed view-graph framework with standard SfM pipeline for two particular use-cases, (i) accurate and ghost-free reconstructions of highly ambiguous datasets using costs based on disambiguation priors, and (ii) accurate and efficient reconstruction of large-scale Internet datasets using costs based on commonly used priors.
Figure 1: Panoramas are widely available online, and more and more video content of these places is shared online. With these data, our video-collection+context interface visualizes the dynamic changes within a collection. The right-hand side shows our spatio-temporal index as a heat map (left), inlayed video foci (center), and fast search with spatial mouse scrubbing (right).
ABSTRACTVideo collections of places show contrasts and changes in our world, but current interfaces to video collections make it hard for users to explore these changes. Recent state-of-the-art interfaces attempt to solve this problem for 'outside→in' collections, but cannot connect 'inside→out' collections of the same place which do not visually overlap. We extend the focus+context paradigm to create a video-collections+context interface by embedding videos into a panorama. We build a spatio-temporal index and tools for fast exploration of the space and time of the video collection. We demonstrate the flexibility of our representation with interfaces for desktop and mobile flat displays, and for a spherical display with joypad and tablet controllers. We study with users the effect of our video-collection+context system to spatio-temporal localization tasks, and find significant improvements to accuracy and completion time in visual search tasks compared to existing systems. We measure the usability of our interface with System Usability Scale (SUS) and task-specific questionnaires, and find our system scores higher.
Abstract. Short internet video clips like vines present a significantly wild distribution compared to traditional video datasets. In this paper, we focus on the problem of unsupervised action classification in wild vines using traditional labeled datasets. To this end, we use a data augmentation based simple domain adaptation strategy. We utilize semantic word2vec space as a common subspace to embed video features from both, labeled source domain and unlabled target domain. Our method incrementally augments the labeled source with target samples and iteratively modifies the embedding function to bring the source and target distributions together. Additionally, we utilize a multi-modal representation that incorporates noisy semantic information available in form of hash-tags. We show the effectiveness of this simple adaptation technique on a test set of vines and achieve notable improvements in performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.