Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation

Kordopatis-Zilos, Giorgos; Galopoulos, Panagiotis; Papadopoulos, Symeon; Kompatsiaris, Ioannis

doi:10.1145/3460426.3463644

Cited by 15 publications

(11 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At test time, the trained model is used to extract image descriptors and perform a classic retrieval. While CosPlace might appear similar to previous classificationbased works [25,30,35,42,52], given that they also partition a map into classes, there are substantial differences. These prior works tackle the task of global classification and group images within very large cells (up to hundreds of kilometers wide), building on the idea that nearer scenes have similar semantics (e.g.…”

Section: Related Workmentioning

confidence: 75%

“…Visual geo-localization as classification. An alternative approach to visual geo-localization is to consider it a classification problem [25,30,35,42,52]. These works build on the idea that two images coming from the same geographical region, although representing different scenes, are likely to share similar semantics, such as architectural styles, types of vehicles, vegetation, etc.…”

Section: Related Workmentioning

confidence: 99%

“…On the other hand, our partitioning strategy is designed to leverage the availability of dense data and ensure that if two images are from the same class, they visualize the same scene. Moreover, unlike [25,30,35,42,52], once trained our method can be used to perform geo-localization through image retrieval on any given geographical area. 13k Eynsham [13] 48k St Lucia [34] 33k NCLT [7] 3.8M Oxford RobotCar [33] 27k CMU [2] 128k Pittsburgh250k [1] 278k TokyoTM/247 [46] 189k MSLS [50] 1.7M San Francisco Landmark [8] 1.1M Aachen [41] 4k SF-XL (Ours) 41.2M…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Rethinking Visual Geo-localization for Large-Scale Applications

Gabriele¹,

Masone²,

Caputo³

2022

Preprint

View full text Add to dashboard Cite

Visual Geo-localization (VG) is the task of estimating the position where a given photo was taken by comparing it with a large database of images of known locations. To investigate how existing techniques would perform on a real-world city-wide VG application, we build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases, with a size 30x bigger than the previous largest dataset for visual geo-localization. We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. We achieve state-of-the-art performance on a wide range of datasets and find that CosPlace is robust to heavy domain changes. Moreover, we show that, compared to the previous state-of-the-art, CosPlace requires roughly 80% less GPU memory at train time, and it achieves better results with 8x smaller descriptors, paving the way for city-wide real-world visual geo-localization. Dataset, code and trained models are available for research purposes at https://github.com/gmberton/CosPlace.

show abstract

Section: Related Workmentioning

confidence: 75%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Rethinking Visual Geo-localization for Large-Scale Applications

Gabriele¹,

Masone²,

Caputo³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…10, we show a more comprehensive set of results than in the main paper, comprising all the aggregation methods that can be attached to the different backbones using our software. As seen in the literature, GeM pooling [60] outperforms in general SPOC [5], MAC [62], R-MAC [71], RRM [39].…”

Section: C2 Aggregation and Descriptors Dimensionalitymentioning

confidence: 88%

“…Over the years, a number of such methods have been proposed, from shallow pooling layers [5,62] to more complex modules [2,37]. Our framework allows to compute results with a number of them, namely SPOC [5], MAC [62], R-MAC [71], RRM [39], GeM [60], NetVLAD [2] and CRN [37]. While a complete list of results with all aggregation methods is shown in Appendix C.2, in Tab.…”

Section: Aggregation and Descriptor Dimensionalitymentioning

confidence: 99%

Deep Visual Geo-localization Benchmark

Gabriele¹,

Mereu²,

Gabriele³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we propose a new open-source benchmarking framework for Visual Geo-localization (VG) that allows to build, train, and test a wide range of commonly used architectures, with the flexibility to change individual components of a geo-localization pipeline. The purpose of this framework is twofold: i) gaining insights into how different components and design choices in a VG pipeline impact the final results, both in terms of performance (recall@N metric) and system requirements (such as execution time and memory consumption); ii) establish a systematic evaluation protocol for comparing different methods. Using the proposed framework, we perform a large suite of experiments which provide criteria for choosing backbone, aggregation and negative mining depending on the use-case and requirements. We also assess the impact of engineering techniques like pre/post-processing, data augmentation and image resizing, showing that better performance can be obtained through somewhat simple procedures: for example, downscaling the images' resolution to 80% can lead to similar results with a 36% savings in extraction time and dataset storage requirement. Code and trained models are available at https://deep-vg-bench.herokuapp.com/.

show abstract