Cross-View Geo-Localization via Learning Disentangled Geometric Layout Correspondence

Zhang, Xiaohan; Li, Xingyu; Sultani, Waqas; Zhou, Yi; Wshah, Safwan

doi:10.1609/aaai.v37i3.25457

Cited by 8 publications

(4 citation statements)

References 35 publications

(115 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rashkovetsky et al combined multiple U-Nets [11] to detect and segment wildfires from multi-band satellite images [6]. Zhang et al proposed to identify the location of ground images by comparing the learned features from geo-tagged satellite images [25,26]. Li et al studied poverty mapping from satellite images by proposing a point-to-region dynamic learning framework that can help people find clean water and sanitation services in low-income countries [27].…”

Section: Satellite Imagery Analysismentioning

confidence: 99%

“…Different from some of the methods mentioned above, for example, refs. [25,27], which only use color channel information, considering the nature of permeable surface mapping, we also take advantage of the infrared band. Our extensive experiments and ablation studies in Section 4 demonstrate the effectiveness of the infrared band in our proposed model.…”

Section: Satellite Imagery Analysismentioning

confidence: 99%

See 1 more Smart Citation

Fine-Grained Permeable Surface Mapping through Parallel U-Net

Ogilvie,

Zhang,

Kochenour

et al. 2024

Sensors

Self Cite

View full text Add to dashboard Cite

Permeable surface mapping, which mainly is the identification of surface materials that will percolate, is essential for various environmental and civil engineering applications, such as urban planning, stormwater management, and groundwater modeling. Traditionally, this task involves labor-intensive manual classification, but deep learning offers an efficient alternative. Although several studies have tackled aerial image segmentation, the challenges in permeable surface mapping arid environments remain largely unexplored because of the difficulties in distinguishing pixel values of the input data and due to the unbalanced distribution of its classes. To address these issues, this research introduces a novel approach using a parallel U-Net model for the fine-grained semantic segmentation of permeable surfaces. The process involves binary classification to distinguish between entirely and partially permeable surfaces, followed by fine-grained classification into four distinct permeability levels. Results show that this novel method enhances accuracy, particularly when working with small, unbalanced datasets dominated by a single category. Furthermore, the proposed model is capable of generalizing across different geographical domains. Domain adaptation is explored to transfer knowledge from one location to another, addressing the challenges posed by varying environmental characteristics. Experiments demonstrate that the parallel U-Net model outperforms the baseline methods when applied across domains. To support this research and inspire future research, a novel permeable surface dataset is introduced, with pixel-wise fine-grained labeling for five distinct permeable surface classes. In summary, in this work, we offer a novel solution to permeable surface mapping, extend the boundaries of arid environment mapping, introduce a large-scale permeable surface dataset, and explore cross-area applications of the proposed model. The three contributions are enhancing the efficiency and accuracy of permeable surface mapping while progressing in this field.

show abstract

Section: Satellite Imagery Analysismentioning

confidence: 99%

Section: Satellite Imagery Analysismentioning

confidence: 99%

Fine-Grained Permeable Surface Mapping through Parallel U-Net

Ogilvie,

Zhang,

Kochenour

et al. 2024

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…L2LTR [16] employs a transformer encoder as a backbone, utilizing self and cross attention mechanisms to emulate global dependency relationships between adjacent layers, thereby enhancing the quality of the learned representations. GeoDTR [17] utilizes a transformer encoder to separate geometric information from the original features and, through a novel geometric contextual extraction module, it can learn the spatial correlations between visual features in satellite and ground images. TransGeo [18] fully leverages the advantages of transformer encoder global information modeling and explicit positional encoding, reducing computational costs and enhancing performance.…”

Section: B Transformer In Geo-localizationmentioning

confidence: 99%

“…The remarkable contextual modeling capability of the transformer compensates for the limitations of CNNs. At present, transformer-based cross-view geo-localization technology mainly utilizes transformer encoders as the backbone of feature extraction, improving the ability of contextual feature extraction [15], [16], [17], [18]. Some methods use ViT [19] as the backbone for extracting context-sensitive information [20], [21] to better adapt to image data.…”

Section: Introductionmentioning

confidence: 99%

GeoFormer: An Effective Transformer-Based Siamese Network for UAV Geolocalization

Li,

Yang,

Fan

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Cross-view geo-localization of unmanned aerial vehicles (UAVs) is a challenging task due to the positional discrepancies and uncertainties in scale and distance between UAVs and satellite views. Existing transformer-based geolocalization methods mainly use encoders to mine image contextual information. However, these methods have some limitations when dealing with scale changes in cross-view images. Therefore, we present an effective transformer-based Siamese network tailored for UAV geo-localization, called GeoFormer. Firstly, an efficient transformer feature extraction network was designed, which utilizes linear attention to reduce the computational complexity and improve the computational efficiency of the network. Among them, we designed an efficient separable perceptron module based on depth-wise separable convolution, which can effectively reduce the computational cost while improving the feature representation of the network. Secondly, we proposed a multi-scale feature aggregation module (MFAM), which deeply fuses salient features at different scales through a feed-forward neural network to generate global feature representations with rich semantics, which improves the model's ability to capture image details and represent robust features. Additionally, we designed a semantic-guided region segmentation module (SRSM), which utilizes a k-modes clustering algorithm to divide the feature map into multiple regions with semantic consistency and performs feature recognition within each semantic region to improve the accuracy of image matching. Finally, we designed a hierarchical reinforcement rotation matching strategy to achieve accurate UAV geo-localization based on the retrieval results of UAV view query satellite images using SuperPoint keypoints extraction and LightGlue rotation matching. According to the experimental results, our method effectively achieves UAV geo-localization.

show abstract