Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2022 International Conference on 3D Vision (3DV) 2022
DOI: 10.1109/3dv57658.2022.00077
|View full text |Cite
|
Sign up to set email alerts
|

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
33
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 92 publications
(56 citation statements)
references
References 64 publications
0
33
0
Order By: Relevance
“…Table 5 reports the depth estimation results for Mon-odepth2 and MonoViT trained on KITTI and evaluated on KITTI and the OOD test set vKITTI. The results are provided using the standard metrics [54,16,1] absolute relative error (Abs Rel), root mean squared error (RMSE) and accuracy δ 1 (δ < 1.25). When comparing the results, it is evident that the depth estimates in all three metrics are significantly worse.…”
Section: Depth Estimation Results On Vkittimentioning
confidence: 99%
See 3 more Smart Citations
“…Table 5 reports the depth estimation results for Mon-odepth2 and MonoViT trained on KITTI and evaluated on KITTI and the OOD test set vKITTI. The results are provided using the standard metrics [54,16,1] absolute relative error (Abs Rel), root mean squared error (RMSE) and accuracy δ 1 (δ < 1.25). When comparing the results, it is evident that the depth estimates in all three metrics are significantly worse.…”
Section: Depth Estimation Results On Vkittimentioning
confidence: 99%
“…Models Compared to recent work [46,41,22], we do not only evaluate on Monodepth2 [16] but also on two recently published transformer-based models Pixelformer [1] and MonoViT [54]. In the case of NYU, Monodepth2 and Pixelformer are trained in a supervised manner.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Monodepth2 [16] used automatic masking loss to reject objects moving at similar speeds, and proposed minimum reprojection loss to deal with occlusion, and proposed a multiscale sampling method to enhance sampling to reduce visual artifacts. lite-Mono [31] proposed a continuous extended convolution (CDC) module to extract rich multiscale local features and local global feature interaction ( LGFI) module to encode remote global information into features.R-MSFM [32] proposed recursive multiscale feature modulation to extract per-pixel features, construct a multiscale feature modulation module, and iteratively update the inverse depth at a fixed resolution through a parameter sharing decoder. Featdepth [17] introduced the FeatureNet network architecture for single-view reconstruction based on the cross-view reconstruction networks DepthNet and PosNet.…”
Section: Self-supervised Depth Estimationmentioning
confidence: 99%