2023
DOI: 10.1109/tgrs.2023.3280546
|View full text |Cite
|
Sign up to set email alerts
|

Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(3 citation statements)
references
References 53 publications
0
3
0
Order By: Relevance
“…For instance, Semantic Structure Preserved cross-modal Embedding (SSPE) [85] constructs semantic graphs from label vectors, models multi-modal data with nonlinear neural networks, trains the model to preserve the local structure of semantic graphs, and reconstructs labels to retain global semantic information. 9) Transformer methods: Transformer methods [89]- [91] draw inspiration from the transformer architecture, which leverages multi-head attention mechanisms to encode multimodal relationships. Typically, Rethinking Label-wise Cross-Modal Retrieval (RLCMR) [89] transforms multi-modal data into individual tokens and combines them within a unified transformer model.…”
Section: ) Metric Learning Methods: Metric Learning Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…For instance, Semantic Structure Preserved cross-modal Embedding (SSPE) [85] constructs semantic graphs from label vectors, models multi-modal data with nonlinear neural networks, trains the model to preserve the local structure of semantic graphs, and reconstructs labels to retain global semantic information. 9) Transformer methods: Transformer methods [89]- [91] draw inspiration from the transformer architecture, which leverages multi-head attention mechanisms to encode multimodal relationships. Typically, Rethinking Label-wise Cross-Modal Retrieval (RLCMR) [89] transforms multi-modal data into individual tokens and combines them within a unified transformer model.…”
Section: ) Metric Learning Methods: Metric Learning Methodsmentioning
confidence: 99%
“…The network is trained to predict categories, optimizing the model to capture meaningful semantic correlations across modalities. Interacting-Enhancing Feature Transformer (IEFT) [91] treats pairs of images and texts as a unity, modeling their intrinsic correlation and introducing feature enhancement to improve cross-modal retrieval accuracy.…”
Section: ) Metric Learning Methods: Metric Learning Methodsmentioning
confidence: 99%
See 1 more Smart Citation