2020 25th International Conference on Pattern Recognition (ICPR) 2021
DOI: 10.1109/icpr48806.2021.9412108
|View full text |Cite
|
Sign up to set email alerts
|

Delivering Meaningful Representation for Monocular Depth Estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 26 publications
0
7
0
1
Order By: Relevance
“…Para a estimativa da profundidade em metros dos mapas gerados pela pix2pix e CycleGAN, foi utilizado o modelo GLPN fine-tuned on NYUv2, conforme descrito em [25]. O modelo pode ser encontrado em [26]. O pipeline retorna um dicionário com duas entradas.…”
Section: Resultados Obtidos E Estudos De Ablaçãounclassified
“…Para a estimativa da profundidade em metros dos mapas gerados pela pix2pix e CycleGAN, foi utilizado o modelo GLPN fine-tuned on NYUv2, conforme descrito em [25]. O modelo pode ser encontrado em [26]. O pipeline retorna um dicionário com duas entradas.…”
Section: Resultados Obtidos E Estudos De Ablaçãounclassified
“…Recently, Li et al [25] design MonoIndoor++, a framework that takes in account the main challenges of indoor scenarios. Kim et al [26] propose GLPDepth, a globallocal transformer network to extract meaningful features at different scales and a Selective Feature Fusion CNN block for the decoder. The authors also integrate a revisited version of CutDepth data augmentation method [27] which is able to improve the training process on the NYU Depth v2 dataset without needing additional data.…”
Section: B Vit-based Mde Methodsmentioning
confidence: 99%
“…Kim et al introduced GLPDepth [18], a Transformerbased architecture and training strategy for monocular depth estimation that considers both the global and local context of the image. It uses SegFormer encoder [40] to capture global dependencies and a lightweight decoder with skip connections to integrate local information.…”
Section: A Relevant State-of-the-art Vision Transformer Models For Se...mentioning
confidence: 99%
“…Unlike them, our method is a simple single-stage method that takes an image as input and performs the joint segmentation and depth estimation tasks in a single forward pass. • For this purpose, we designed a hybrid encoding and decoding framework based on Vision transformer variants SegFormer [40] and GLPDepth [18]. • We chose the best model for each task (segmentation and depth estimation) to design a multitask model based on a thorough assessment of their advantages and drawbacks.…”
Section: Introductionmentioning
confidence: 99%