2022
DOI: 10.1145/3499027
|View full text |Cite
|
Sign up to set email alerts
|

Cross-modal Graph Matching Network for Image-text Retrieval

Abstract: Image-text retrieval is a fundamental cross-modal task whose main idea is to learn image-text matching. Generally, according to whether there exist interactions during the retrieval process, existing image-text retrieval methods can be classified into independent representation matching methods and cross-interaction matching methods. The independent representation matching methods generate the embeddings of images and sentences independently and thus are convenient for retrieval with hand-crafted matching meas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(25 citation statements)
references
References 38 publications
0
15
0
Order By: Relevance
“…To validate the efficiency of our proposed FB-Net, we compare it with several state-of-the-art methods, in which seven non-DNN-based cross-modal retrieval methods (i.e., CCA [3], CMCP [23], JRL [25], JFSSL [26], and S 2 2UPG [27]) and nine DNN-based methods (i.e., DCCA [7], CCL [12], SCAN [15], GXN [28], VSESC [29], MAVA [30], SGRAF [31],SCL [41],CGMN [42],NAAF [32], and VSRN++ [33]) are contained. Note that the comparison methods are implemented using the authors' public source codes and are enumerated as follows.…”
Section: Compared Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To validate the efficiency of our proposed FB-Net, we compare it with several state-of-the-art methods, in which seven non-DNN-based cross-modal retrieval methods (i.e., CCA [3], CMCP [23], JRL [25], JFSSL [26], and S 2 2UPG [27]) and nine DNN-based methods (i.e., DCCA [7], CCL [12], SCAN [15], GXN [28], VSESC [29], MAVA [30], SGRAF [31],SCL [41],CGMN [42],NAAF [32], and VSRN++ [33]) are contained. Note that the comparison methods are implemented using the authors' public source codes and are enumerated as follows.…”
Section: Compared Methodsmentioning
confidence: 99%
“…• CGMN [42] uses graph convolutional networks to investigate the intra-relation in images and sentences and accomplishes interrelation reasoning between regions and words without impacting search efficiency.…”
Section: Compared Methodsmentioning
confidence: 99%
“…Baseline and comparative methods: Basic cross-modal initial retrieval methods [3], [6], [7], [8], [10], [11], [34], [35], [36], [37], [38], [39] were used as baseline methods. By comparing these baseline methods and our method with them, we confirm that our re-ranking method can improve the initial retrieval performance.…”
Section: Microsoft Common Objects In Context (Mscoco) [33]mentioning
confidence: 99%
“…In addition to a single attention module, Song and Soleymani [34] utilized a multi-head self-attention network to exploit polysemous meanings. In addition, the graph convolutional network (GCN) has been employed in several methods to consider the relationship between local features, and these methods demonstrated good performance [35], [36].…”
Section: A Cross-modal Retrievalmentioning
confidence: 99%
“…For evaluating the effectiveness of our sentence-based semantic loss function, we introduce our loss to the training of recently proposed cross-modal image retrieval methods [31], [34], [35], [36]. We compared the cross-modal retrieval methods with our loss and the original ones.…”
Section: ) Implementation Detailsmentioning
confidence: 99%