Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval

Tang, Xu; Wang, Yijing; Ma, Jingjing; Zhang, Xiangrong; Liu, Fang; Jiao, Licheng

doi:10.1109/tgrs.2023.3280546

Cited by 13 publications

(3 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, Semantic Structure Preserved cross-modal Embedding (SSPE) [85] constructs semantic graphs from label vectors, models multi-modal data with nonlinear neural networks, trains the model to preserve the local structure of semantic graphs, and reconstructs labels to retain global semantic information. 9) Transformer methods: Transformer methods [89]- [91] draw inspiration from the transformer architecture, which leverages multi-head attention mechanisms to encode multimodal relationships. Typically, Rethinking Label-wise Cross-Modal Retrieval (RLCMR) [89] transforms multi-modal data into individual tokens and combines them within a unified transformer model.…”

Section: ) Metric Learning Methods: Metric Learning Methodsmentioning

confidence: 99%

“…The network is trained to predict categories, optimizing the model to capture meaningful semantic correlations across modalities. Interacting-Enhancing Feature Transformer (IEFT) [91] treats pairs of images and texts as a unity, modeling their intrinsic correlation and introducing feature enhancement to improve cross-modal retrieval accuracy.…”

Section: ) Metric Learning Methods: Metric Learning Methodsmentioning

confidence: 99%

“…5) Remote sensing image retrieval. Cross-modal retrieval in remote sensing image retrieval allows users to find relevant images by providing textual descriptions of geographic scenes or specific objects [91], [246]. Users can describe land cover types, geographical features, or objects of interest, and the system retrieves corresponding images.…”

Section: Taskmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Query-based Black-box Attack against Cross-modal Hashing Retrieval

Zhu

Wang

et al. 2023

ACM Trans. Inf. Syst.

View full text Add to dashboard Cite

Deep cross-modal hashing retrieval models inherit the vulnerability of deep neural networks. They are vulnerable to adversarial attacks, especially for the form of subtle perturbations to the inputs. Although many adversarial attack methods have been proposed to handle the robustness of hashing retrieval models, they still suffer from two problems: 1) Most of them are based on the white-box settings, which is usually unrealistic in practical application. 2) Iterative optimization for the generation of adversarial examples in them results in heavy computation. To address these problems, we propose an Efficient Query-based Black-Box Attack (EQB 2 A) against deep cross-modal hashing retrieval, which can efficiently generate adversarial examples for the black-box attack. Specifically, by sending a few query requests to the attacked retrieval system, the cross-modal retrieval model stealing is performed based on the neighbor relationship between the retrieved results and the query, thus obtaining the knockoffs to substitute the attacked system. A multi-modal knockoffs-driven adversarial generation is proposed to achieve efficient adversarial example generation. While the entire network training converges, EQB 2 A can efficiently generate adversarial examples by forward-propagation with only given benign images. Experiments show that EQB 2 A achieves superior attacking performance under the black-box setting.

show abstract