2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00721
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Visual Localization

Abstract: Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, e.g., in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
161
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4
1

Relationship

2
8

Authors

Journals

citations
Cited by 245 publications
(164 citation statements)
references
References 77 publications
3
161
0
Order By: Relevance
“…The latter type of methods have recently been shown to not perform consistently better than image retrieval methods [76], i.e., approaches that approximate the pose of the query image by the pose of the most similar database image [3,38,87]. As such, state-of-the-art methods for long-term visual localization at scale either rely on local features for matching [28,71,78,83,85,86] or use image retrieval techniques [2-4, 63, 80, 87, 94].…”
Section: Related Workmentioning
confidence: 99%
“…The latter type of methods have recently been shown to not perform consistently better than image retrieval methods [76], i.e., approaches that approximate the pose of the query image by the pose of the most similar database image [3,38,87]. As such, state-of-the-art methods for long-term visual localization at scale either rely on local features for matching [28,71,78,83,85,86] or use image retrieval techniques [2-4, 63, 80, 87, 94].…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, traffic signs are detected from images and matched against a geo-referenced sign database, after which local bundle adjustment is conducted to estimate a fine-grained pose. More recently, [30] built dense semantic maps using image segmentation and conducted localization by matching both semantic and geometric cues. In contrast, the maps used in our approach only need to contain the lane graphs and the inferred sign map, the latter of which is computed without a human in the loop, while also only requiring a fraction of the storage used by dense maps.…”
Section: Related Workmentioning
confidence: 99%
“…The geotag of the most similar database image is then often used to approximate the pose of the query image [30,31,76,83]. Place recognition approaches can also be used as part of a visual localization pipeline [13,29,53,62,72]: 2D-3D matching can be restricted to the parts of the scene visible in a short list of n visually similar database images, resulting in one pose estimate per retrieved image. This restriction helps to avoid global ambiguities in a scene, e.g., caused by similar structures found in unrelated parts of a scene, during matching [54].…”
Section: Related Workmentioning
confidence: 99%