2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023
DOI: 10.1109/wacv56688.2023.00286
|View full text |Cite
|
Sign up to set email alerts
|

TransVLAD: Multi-Scale Attention-Based Global Descriptors for Visual Geo-Localization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…Several studies have shown that multiscale information can be utilized to effectively improve performance of a model in the VPR task 10,13,18,29,40,41 ; this approach can effectively avoid the problem of localization failure caused by the scale difference between the query image and the reference image. The differences between these approaches are how the multiscale information is obtained and how to collect more spatial contextual information when considering multiscale information.…”
Section: Leveraging Multiscale Informationmentioning
confidence: 99%
See 1 more Smart Citation
“…Several studies have shown that multiscale information can be utilized to effectively improve performance of a model in the VPR task 10,13,18,29,40,41 ; this approach can effectively avoid the problem of localization failure caused by the scale difference between the query image and the reference image. The differences between these approaches are how the multiscale information is obtained and how to collect more spatial contextual information when considering multiscale information.…”
Section: Leveraging Multiscale Informationmentioning
confidence: 99%
“…Peng et al 40 used a semantically enhanced local weighting scheme for local feature refinement and then constructed an attention pyramid based on the spatial saliency of regional features for the adaptive encoding of local features. Xu et al 41 used an attention-based sparse encoder to obtain feature maps, thereby capturing global dependencies. Then, self-supervised learning was utilized to further acquire multiscale information between query images, effectively reducing the visual ambiguities arising in large-scale VPR.…”
Section: Leveraging Multiscale Informationmentioning
confidence: 99%
“…The ViT [19] model utilizes the classic Transformer encoder structure to achieve image classification tasks, marking the beginning of the Transformer's application in the field of vision and gradually playing a role in cross-view geo-localization [57]. Following the architecture of NetVLAD [58], TransVLAD [15] utilizes a sparse transformer encoder to obtain global descriptors. It was further combined with DFM [59] to obtain more dense and accurate matching results.…”
Section: B Transformer In Geo-localizationmentioning
confidence: 99%
“…The remarkable contextual modeling capability of the transformer compensates for the limitations of CNNs. At present, transformer-based cross-view geo-localization technology mainly utilizes transformer encoders as the backbone of feature extraction, improving the ability of contextual feature extraction [15], [16], [17], [18]. Some methods use ViT [19] as the backbone for extracting context-sensitive information [20], [21] to better adapt to image data.…”
Section: Introductionmentioning
confidence: 99%
“…NetVLAD is a differentiable implementation of VLAD that is trained end-to-end with a CNN backbone for direct place recognition. It has been widely adopted in various works [9], [11], [14], [24]. Task-specific patch-level features have also been explored for VPR [9], [25].…”
Section: A Recent Progress On Visual Place Recognition (Vpr)mentioning
confidence: 99%