Yohann Cabon scite author profile

Modern computer vision algorithms typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to generate fully labeled, dynamic, and photo-realistic proxy virtual worlds. We propose an efficient real-to-virtual world cloning method, and validate our approach by building and publicly releasing a new video dataset, called "Virtual KITTI" 1 , automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow. We provide quantitative experimental evidence suggesting that (i) modern deep learning algorithms pre-trained on real data behave similarly in real and virtual worlds, and (ii) pre-training on virtual data improves performance. As the gap between real and virtual worlds is small, virtual worlds enable measuring the impact of various weather and imaging conditions on recognition performance, all other things being equal. We show these factors may affect drastically otherwise high-performing deep models for tracking.

show abstract

Procedural Generation of Videos to Train Deep Action Recognition Networks

Souza

Gaidon

Cabon

et al. 2017

102

View full text Add to dashboard Cite

Benchmarking Image Retrieval for Visual Localization

Pion

Humenberger

Csurka

et al. 2020

View full text Add to dashboard Cite

Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two tasks: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for these tasks. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes. However, robustness to viewpoint changes is not necessarily desirable in the context of visual localization. This paper focuses on understanding the role of image retrieval for multiple visual localization tasks. We introduce a benchmark setup and compare state-of-the-art retrieval representations on multiple datasets. We show that retrieval performance on classical landmark retrieval/recognition tasks correlates only for some but not all tasks to localization performance. This indicates a need for retrieval approaches specifically designed for localization tasks. Our benchmark and evaluation protocols are available at https://github.com/ naver/kapture-localization.

show abstract

Robust Image Retrieval-based Visual Localization using Kapture

Humenberger¹,

Cabon²,

Guérin³

et al. 2020

Preprint

View full text Add to dashboard Cite

In this paper, we present a versatile method for visual localization. It is based on robust image retrieval for coarse camera pose estimation and robust local features for accurate pose refinement. Our method is top ranked on various public datasets showing its ability of generalization and its great variety of applications. To facilitate experiments, we introduce kapture, a flexible data format and processing pipeline for structure from motion and visual localization that is released open source. We furthermore provide all datasets used in this paper in the kapture format to facilitate research and data processing. Code and datasets can be found at https://github.com/ naver/kapture, more information, updates, and news can be found at https://europe.naverlabs.com/ research/3d-vision/kapture.

show abstract

Visual Localization by Learning Objects-Of-Interest Dense Match Regression

Weinzaepfel

Cabon

Humenberger

2019

View full text Add to dashboard Cite

SLAMANTIC - Leveraging Semantics to Improve VSLAM in Dynamic Environments

Schörghuber

Steininger

Cabon³

et al. 2019

View full text Add to dashboard Cite

Virtual Worlds as Proxy for Multi-Object Tracking Analysis

Gaidon¹,

Wang²,

Cabon³

et al. 2016

Preprint

View full text Add to dashboard Cite

Large-scale Localization Datasets in Crowded Indoor Spaces

Lee

Ryu

Yeon

et al. 2021

View full text Add to dashboard Cite

Estimating the precise location of a camera using visual localization enables interesting applications such as augmented reality or robot navigation. This is particularly useful in indoor environments where other localization technologies, such as GNSS, fail. Indoor spaces impose interesting challenges on visual localization algorithms: occlusions due to people, textureless surfaces, large viewpoint changes, low light, repetitive textures, etc. Existing indoor datasets are either comparably small or do only cover a subset of the mentioned challenges. In this paper, we introduce 5 new indoor datasets for visual localization in challenging real-world environments. They were captured in a large shopping mall and a large metro station in Seoul, South Korea, using a dedicated mapping platform consisting of 10 cameras and 2 laser scanners. In order to obtain accurate ground truth camera poses, we developed a robust LiDAR SLAM which provides initial poses that are then refined using a novel structure-from-motion based optimization. We present a benchmark of modern visual localization algorithms on these challenging datasets showing superior performance of structure-based methods using robust image features. The datasets are available at: https://naverlabs.com/datasets

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yohann Cabon

VirtualWorlds as Proxy for Multi-object Tracking Analysis

Procedural Generation of Videos to Train Deep Action Recognition Networks

Benchmarking Image Retrieval for Visual Localization

Robust Image Retrieval-based Visual Localization using Kapture

Visual Localization by Learning Objects-Of-Interest Dense Match Regression

SLAMANTIC - Leveraging Semantics to Improve VSLAM in Dynamic Environments

Virtual Worlds as Proxy for Multi-Object Tracking Analysis

Large-scale Localization Datasets in Crowded Indoor Spaces

Contact Info

Product

Resources

About