Image matching is a fundamental problem in Computer Vision. In the context of feature-based matching, SIFT and its variants have long excelled in a wide array of applications. However, for ultra-wide baselines, as in the case of aerial images captured under large camera rotations, the appearance variation goes beyond the reach of SIFT and RANSAC. In this paper we propose a data-driven, deep learning-based approach that sidesteps local correspondence by framing the problem as a classification task. Furthermore, we demonstrate that local correspondences can still be useful. To do so we incorporate an attention mechanism to produce a set of probable matches, which allows us to further increase performance. We train our models on a dataset of urban aerial imagery consisting of 'same' and 'different' pairs, collected for this purpose, and characterize the problem via a human study with annotations from Amazon Mechanical Turk. We demonstrate that our models outperform the state-of-the-art on ultra-wide baseline matching and approach human accuracy.
In computer vision, the extraction of effective features for the detection and description of important image regions is a key step for many applications. Traditionally, these features are extracted using hand engineered detectors and descriptors. Approaches adopting this paradigm are generally referred to as keypoint-based or feature-based approaches. Recently, the reintroduction of neural networks into many computer vision tasks broadly replaced hand-engineered feature-based approaches. Neural network based approaches generally learn the feature extraction as part an end-to-end pipeline. While these approaches have shown great success in tasks such as object detection and classification, other tasks such as structure-from-motion (SfM) still depend on purely engineered features, e.g. SIFT, to detect and describe keypoints.In this paper, we propose a model that learns what constitutes a good keypoint, is capable of capturing keypoints at multiple scales and learns to decide whether two keypoints match. We achieve multiscale keypoint detection with a fully-convolutional network that recursively applies convolutions to regresses keypoint scores. With each successive convolution, the network evaluates image patches, i.e., keypoints, at a larger scale. By extracting the keypoint feature map after each convolution we obtain a feature map that resembles a keypoint scale-space. To learn descriptors for keypoint matching, we leverage a triplet network to learn an embedding where patches of matching keypoints are closer to each other than non-matching patches. Figure 1 provides an overview of our proposed model.There is currently no large-scale dataset for learning both keypoint detectors and descriptors from image patches. Furthermore, finding training examples to train deep neural networks for this task poses a serious challenge, as collecting human annotated examples would be prohibitively expensive. Therefore, we create our own dataset by following a self-supervised ap- proach. We utilize SfM to construct a large-scale model of 1.3 million 3D points, which are used to extract matching patches with varying photometric properties such as scale, illumination, perspective. Although those feature detections and matches were determined originally with engineered features, SfM factors in the underlying geometry. This allows to learn features that extend upon their engineered counterparts.We evaluate the proposed model both quantitatively and qualitatively and show its capablity of identifying multiscale keypoints as well as matching them. We show that the descriptors outperform previous approaches and demonstrate the transferability to unseen datasets with different statistics; Figure 2 shows an example.
Automatic evaluation of human facial attractiveness is a challenging problem that has received relatively little attention from the computer vision community. Previous work in this area have posed attractiveness as a classification problem. However, for applications that require fine-grained relationships between objects, learning to rank has been shown to be superior over the direct interpretation of classifier scores as ranks [27]. In this paper, we propose and implement a personalized relative beauty ranking system. Given training data of faces sorted based on a subject's personal taste, we learn how to rank novel faces according to that person's taste. Using a blend of Facial Geometric Relations, HOG, GIST, L*a*b* Color Histograms, and Dense-SIFT + PCA feature types, our system achieves an average accuracy of 63% on pairwise comparisons of novel test faces. We examine the effectiveness of our method through lesion testing and find that the most effective feature types for predicting beauty preferences are HOG, GIST, and Dense-SIFT + PCA features.
Correspondence matching is a core problem in computer vision. Under narrow baseline viewing conditions, this problem has been successfully addressed using SIFT-like approaches. However, under wide baseline viewing conditions these methods often fail. In this paper we propose a method for correspondence estimation that addresses this challenge for aerial scenes in urban environments. Our method creates synthetic views and leverages self-similarity cues to recover correspondences using a RANSAC-based approach aided by self-similarity graph-based sampling. We evaluate our method on 30 challenging image pairs and demonstrate improved performance to alternative methods in the literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.