2022
DOI: 10.1109/access.2022.3162693
|View full text |Cite
|
Sign up to set email alerts
|

A Semantic Guidance and Transformer-Based Matching Method for UAVs and Satellite Images for UAV Geo-Localization

Abstract: It is a challenging task for unmanned aerial vehicles (UAVs) without a positioning system to locate targets by using images. Matching drone and satellite images is one of the key steps in this task. Due to the large angle and scale gap between drone and satellite views, it is very important to extract finegrained features with strong characterization ability. Most of the published methods are based on the CNN structure, but a lot of information will be lost when using such methods. This is caused by the limita… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 49 publications
0
3
0
Order By: Relevance
“…In the Satellite->Drone task, R@1 is 92.30% and AP is 87.66%. Its performance surpasses advanced methods such as LPN [29], FSRA [4], SGM [32], and PAAN [50]. On the self-made dataset, the accuracy exhibits a trend similar to that on the University-1652 dataset: our method outperforms the aforementioned two methods.…”
Section: Quantitative Statisticsmentioning
confidence: 66%
See 2 more Smart Citations
“…In the Satellite->Drone task, R@1 is 92.30% and AP is 87.66%. Its performance surpasses advanced methods such as LPN [29], FSRA [4], SGM [32], and PAAN [50]. On the self-made dataset, the accuracy exhibits a trend similar to that on the University-1652 dataset: our method outperforms the aforementioned two methods.…”
Section: Quantitative Statisticsmentioning
confidence: 66%
“…Dai [4] achieved automatic region segmentation based on the heat distribution of Transformer feature maps, aligning specific regions in different views to improve the model's accuracy and robustness to location variations. Zhuang [32] proposed a Transformer-based network to match drone images with satellite images. This network classifies each pixel in the image using pixel-wise attention, matching the same semantic parts in two images.…”
Section: Cross-view Geolocalizationmentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, the FSRA (Dai et al, 2021) automatically divided the original image into multiple regions based on the heat distribution of the feature map, and achieved feature alignment based on region consistency. Zhuang et al (2022) introduced semantic constraints based on FSRA to enhance the effectiveness of feature alignment. However, a limitation shared by all the above methods is that the training process still involves the backbone network, resulting in increased computational and time overhead for the retrieval task.…”
Section: Cross-view Remote Sensing Image Retrievalmentioning
confidence: 99%
“…In contrast, Swin Transformer [22] introduces a hierarchical structure and a shift window mechanism, which enhances the model's ability to capture spatial relationships between neighboring regions through window sliding and offsetting. As a result, Swin Transformer is able to capture local features in an image more effectively, while ensuring the integration of global information through cross-window connections [60]. In addition, Swin Transformer improves computational efficiency by performing self-attention computation independently within each window, enabling the model to process windows in parallel.…”
Section: B Transformer In Geo-localizationmentioning
confidence: 99%