Visual landmark recognition from Internet photo collections: A large-scale evaluation

Weyand, Tobias; Leibe, Bastian

doi:10.1016/j.cviu.2015.02.002

Cited by 31 publications

(15 citation statements)

References 62 publications

(209 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A related line of research is landmark recognition, where images are clustered by their geolocations and visual similarity to construct a database of popular landmarks. The database serves as the index of an image retrieval system [28,29,30,31,32,33] or the training data of a landmark classifier [34,35,36]. Cross-view geolocation recognition makes additional use of satellite or aerial imagery to determine query locations [37,38,39,40].…”

Section: Related Workmentioning

confidence: 99%

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

Seo

Weyand

Sim

et al. 2018

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granularity of this partitioning presents a critical trade-off; using fewer but larger cells results in lower location accuracy while using more but smaller cells reduces the number of training examples per class and increases model size, making the model prone to overfitting. To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. Each classifier votes for the fine-grained classes that overlap with their respective coarse-grained ones. This technique allows us to predict locations at a fine scale while maintaining sufficient training examples per class. Our algorithm achieves the state-of-the-art performance in location recognition on multiple benchmark datasets.

show abstract

Section: Related Workmentioning

confidence: 99%

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

Seo

Weyand

Sim

et al. 2018

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Nevertheless, it allows to suggest that Siamese architecture together with contrastive loss objective is a good choice for learning features for image matching and retrieval tasks. Moreover, using additional relevant datasets [26], [27] during training might further enhance the accuracy and performance of the approach.…”

Section: Discussionmentioning

confidence: 99%

Siamese network features for image matching

Melekhov

Kannala

Rahtu

2016

2016 23rd International Conference on Pattern Recognition (ICPR)

252

122

View full text Add to dashboard Cite

Finding matching images across large datasets plays a key role in many computer vision applications such as structure-from-motion (SfM), multi-view 3D reconstruction, image retrieval, and image-based localisation. In this paper, we propose finding matching and non-matching pairs of images by representing them with neural network based feature vectors, whose similarity is measured by Euclidean distance. The feature vectors are obtained with convolutional neural networks which are learnt from labeled examples of matching and non-matching image pairs by using a contrastive loss function in a Siamese network architecture. Previously Siamese architecture has been utilised in facial image verification and in matching local image patches, but not yet in generic image retrieval or whole-image matching. Our experimental results show that the proposed features improve matching performance compared to baseline features obtained with networks which are trained for image classification task. The features generalize well and improve matching of images of new landmarks which are not seen at training time. This is despite the fact that the labeling of matching and non-matching pairs is imperfect in our training data. The results are promising considering image retrieval applications, and there is potential for further improvement by utilising more training image pairs with more accurate ground truth labels.

show abstract

“…Moreover, the point of view is consistently at the street level, without big changes in the vertical orientation. Non robotics datasets typically are not collected as sequences of frames [22], [23], [38], [69], [80], [81], [116]- [118], [122], [125], [190], [215], [242], [244], [246], [251]. In many cases, they are created by collections of online images, with variable viewpoints [122] 2008 Urban City ∼6k Label Holidays [118] 2008 Outdoor World ∼2k Label Eynsham [21] 2009 Urban City ∼70k GPS St. Lucia [240], [241] 2010 Urban City ∼66k GPS European Cities 50k [22] 2010 Urban Continent ∼50k Label Geotagged StreetView [23] 2010 Urban City ∼17k GPS Rome 16k [242] 2010 Urban City ∼16k Pose Dubrovnik 6k [242] 2010 Urban City ∼6.8k Pose San Francisco [243] 2011 Urban City ∼1.06M GPS Alderley [45] 2012 Urban City ∼31k GPS 7 Scenes [244] 2013 Indoor Building ∼43k Pose Nordland [155] 2013 Outdoor Region ∼143k GPS Google StreetView 62k [114] 2014 Urban City ∼62k GPS Freiburg Across Seasons [192], [245] 2014 Urban City ∼43k GPS Cambridge Landmarks [215] 2015 Urban City ∼10.8k Pose Paris500k [246] 2015 Urban City ∼504k Label Pittsburgh [117] 2015 Urban City ∼278k GPS Landmarks-full [80], [125] 2016 Urban World ∼192k Label NCLT [247] 2016 Outdoor + Indoor...…”

Section: A Datasetsmentioning

confidence: 99%

A Survey on Deep Visual Place Recognition

2021

View full text Add to dashboard Cite

In recent years visual place recognition (VPR), i.e., the problem of recognizing the location of images, has received considerable attention from multiple research communities, spanning from computer vision to robotics and even machine learning. This interest is fueled on one hand by the relevance that visual place recognition holds for many applications and on the other hand by the unsolved challenge of making these methods perform reliably in different conditions and environments. This paper presents a survey of the state-of-the-art of research on visual place recognition, focusing on how it has been shaped by the recent advances in deep learning. We start discussing the image representations used in this task and how they have evolved from using hand-crafted to deep-learned features. We further review how metric learning techniques are used to get more discriminative representations, as well as techniques for dealing with occlusions, distractors, and shifts in the visual domain of the images. The survey also provides an overview of the specific solutions that have been proposed for applications in robotics and with aerial imagery. Finally the survey provides a summary of datasets that are used in visual place recognition, highlighting their different characteristics. INDEX TERMS Visual place recognition, image representation learning, deep learning.

show abstract

Visual landmark recognition from Internet photo collections: A large-scale evaluation

Cited by 31 publications

References 62 publications

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

Siamese network features for image matching

A Survey on Deep Visual Place Recognition

Contact Info

Product

Resources

About