Interpretable Semantic Photo Geolocation

Theiner, Jonas; Müller-Budack, Eric; Ewerth, Ralph

doi:10.1109/wacv51458.2022.00154

Cited by 10 publications

(14 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Next, we use an ensemble of hierarchical classification using all three resolutions. However, in agreement with [43], this method does not achieve a consistent improvement than considering only fine partitioning. Moreover, the ensemble increases inference time by almost 9%.…”

Section: A Implementation Details and Hyper-parameter Values A1 Adapt...mentioning

confidence: 81%

“…We train TransLocator in a unified multi-task framework for simultaneous geo-localization and scene recognition, and thus, our system can be applied to images from all environmental settings. Extensive experiments with TransLocator on four benchmark datasets -Im2GPS [13], Im2GPS3k [14], YFCC4k [50] and YFCC26k [43] shows a significant improvement of 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy over current state-of-the-art. We also obtain better qualitative results when we test TransLocator on challenging real-world images.…”

Section: Discussionmentioning

confidence: 93%

“…(ii) We propose a simple yet efficient fusion of two transformer branches, which helps TransLocator to learn robust features under extreme appearance variation. (iii) We achieve state-of-the-art performance on four datasets with a significant improvement of 5.5%, 14.1%, 4.9%, 9.9% continentlevel geolocational accuracy on Im2GPS [13], Im2GPS3k [14], YFCC4k [50], and YFCC26k [43], respectively. We also qualitatively evaluate the effectiveness of the proposed method on real-world images.…”

Section: Introductionmentioning

confidence: 81%

“…Like Vo et al [50], during training we excluded images taken by the same authors in our validation or test sets. We validated and tested our model on two randomly sampled subsets of images from the Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M) [44], referred to as YFCC26k [43] and YFCC4k [50] containing 25, 600 and 4536 images, respectively. Since the images of MP-16, YFCC26k and YFCC4k were sourced without any scene and user restrictions, these datasets contain images of landmarks and landscapes, but also ambiguous images with little to no geographical cues, such as photographs of food and portraits of people.…”

Section: Datasetsmentioning

confidence: 99%

See 3 more Smart Citations

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Pramanick¹,

Nowara²,

Gleason³

et al. 2022

Preprint

View full text Add to dashboard Cite

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. To this end, we propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentation map as inputs, interacts between its two parallel branches after each transformer layer and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets -Im2GPS [13], Im2GPS3k [14], YFCC4k [50], YFCC26k [43] and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods.

show abstract

Section: A Implementation Details and Hyper-parameter Values A1 Adapt...mentioning

confidence: 81%

Section: Discussionmentioning

confidence: 93%

Section: Introductionmentioning

confidence: 81%

Section: Datasetsmentioning

confidence: 99%

See 2 more Smart Citations

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Pramanick¹,

Nowara²,

Gleason³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…High performance in localisation indicates that the explanations often align with the bounding boxes or segmentation masks provided by human annotators. We consider two localisation metrics, the pointing game [57] and top-k intersection [46]. The pointing game measures whether the pixel with the highest importance is located within the object location.…”

Section: Evaluation Of Explanationsmentioning

confidence: 99%

RELAX: Representation Learning Explainability

Wickstrøm¹,

Trosten²,

Løkse³

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite the significant improvements that representation learning via self-supervision has led to when learning from unlabeled data, no methods exist that explain what influences the learned representation. We address this need through our proposed approach, RELAX, which is the first approach for attribution-based explanations of representations. Our approach can also model the uncertainty in its explanations, which is essential to produce trustworthy explanations. RELAX explains representations by measuring similarities in the representation space between an input and masked out versions of itself, providing intuitive explanations and significantly outperforming the gradient-based baseline. We provide theoretical interpretations of RELAX and conduct a novel analysis of feature extractors trained using supervised and unsupervised learning, providing insights into different learning strategies. Finally, we illustrate the usability of RELAX in multi-view clustering and highlight that incorporating uncertainty can be essential for providing low-complexity explanations, taking a crucial step towards explaining representations.

show abstract