Varma, Arnav scite author profile

Dense depth estimation is essential to sceneunderstanding for autonomous driving. However, recent selfsupervised approaches on monocular videos suffer from scaleinconsistency across long sequences. Utilizing data from the ubiquitously copresent global positioning systems (GPS), we tackle this challenge by proposing a dynamically-weighted GPS-to-Scale (g2s) loss to complement the appearance-based losses. We emphasize that the GPS is needed only during the multimodal training, and not at inference. The relative distance between frames captured through the GPS provides a scale signal that is independent of the camera setup and scene distribution, resulting in richer learned feature representations. Through extensive evaluation on multiple datasets, we demonstrate scale-consistent and -aware depth estimation during inference, improving the performance even when training with low-frequency GPS data.

show abstract

Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics

Arnav¹,

Chawla²,

Zonooz³

2022

View full text Add to dashboard Cite

The advent of autonomous driving and advanced driver assistance systems necessitates continuous developments in computer vision for 3D scene understanding. Self-supervised monocular depth estimation, a method for pixel-wise distance estimation of objects from a single camera without the use of ground truth labels, is an important task in 3D scene understanding. However, existing methods for this task are limited to convolutional neural network (CNN) architectures. In contrast with CNNs that use localized linear operations and lose feature resolution across the layers, vision transformers process at constant resolution with a global receptive field at every stage. While recent works have compared transformers against their CNN counterparts for tasks such as image classification, no study exists that investigates the impact of using transformers for self-supervised monocular depth estimation. Here, we first demonstrate how to adapt vision transformers for self-supervised monocular depth estimation. Thereafter, we compare the transformer and CNN-based architectures for their performance on KITTI depth prediction benchmarks, as well as their robustness to natural corruptions and adversarial attacks, including when the camera intrinsics are unknown. Our study demonstrates how transformer-based architecture, though lower in run-time efficiency, achieves comparable performance while being more robust and generalizable.

show abstract

Highlighting the Importance of Reducing Research Bias and Carbon Emissions in CNNs

Badar¹,

Arnav²,

Adrian³

et al. 2022

View full text Add to dashboard Cite

A Comprehensive Study of Vision Transformers on Dense Prediction Tasks

Kishaan¹,

Kathiresan²,

Arnav³

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Varma, Arnav

Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics

Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation

Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics

Highlighting the Importance of Reducing Research Bias and Carbon Emissions in CNNs

A Comprehensive Study of Vision Transformers on Dense Prediction Tasks

Contact Info

Product

Resources

About