Vincent Casser scite author profile

Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has established strong baselines in the domain. We propose a novel approach which produces higher quality results, is able to model moving objects and is shown to transfer across data domains, e.g. from outdoors to indoor scenes. The main idea is to introduce geometric structure in the learning process, by modeling the scene and the individual objects; camera ego-motion and object motions are learned from monocular videos as input. Furthermore an online refinement method is introduced to adapt learning on the fly to unknown domains. The proposed approach outperforms all state-of-the-art approaches, including those that handle motion e.g. through learned flow. Our results are comparable in quality to the ones which used stereo as supervision and significantly improve depth prediction on scenes and datasets which contain a lot of object motion. The approach is of practical relevance, as it allows transfer across environments, by transferring models trained on data collected for robot navigation in urban scenes to indoor navigation settings. The code associated with this paper can be found at https://sites.google.com/ view/struct2depth.

show abstract

Block-NeRF: Scalable Large Scene Neural View Synthesis

Tancik

Casser²,

Yan³

et al. 2022

338

View full text Add to dashboard Cite

Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

et al. 2018

View full text Add to dashboard Cite

We present a photo-realistic training and evaluation simulator (Sim4CV) 1 with extensive applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator integrates full featured physics based cars, unmanned aerial vehicles (UAVs), and animated human actors in diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two case studies: autonomous UAV-based tracking of moving objects and autonomous driving using supervised learning. The simulator fully integrates both several state-of-the-art tracking algorithms with a benchmark evaluation tool and a deep neural network (DNN) architecture for training vehicles to drive autonomously. It generates synthetic photo-realistic datasets with automatic ground truth annotations to easily extend existing real-world datasets and provides extensive synthetic data variety through its ability to reconfigure synthetic worlds on the fly using an automatic world generation tool.

show abstract

Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability

Newman

Fosco

Casser

et al. 2020

View full text Add to dashboard Cite

Unsupervised Monocular Depth and Ego-Motion Learning With Structure and Semantics

Casser

Pirk

Mahjourian

et al. 2019

View full text Add to dashboard Cite

We present an approach which takes advantage of both structure and semantics for unsupervised monocular learning of depth and ego-motion. More specifically, we model the motion of individual objects and learn their 3D motion vector jointly with depth and egomotion. We obtain more accurate results, especially for challenging dynamic scenes not addressed by previous approaches. This is an extended version of Casser et al. [1]. Code and models have been open sourced at: https

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Vincent Casser

Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Block-NeRF: Scalable Large Scene Neural View Synthesis

Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability

Unsupervised Monocular Depth and Ego-Motion Learning With Structure and Semantics

Contact Info

Product

Resources

About