2021 IEEE International Conference on Multimedia and Expo (ICME) 2021
DOI: 10.1109/icme51207.2021.9428194
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Graph Based Hierarchical Semantic Fusion for Cross-Modal Representation

Abstract: The main challenge of cross-modal retrieval is how to efficiently realize semantic alignment and reduce the heterogeneity gap. However, existing approaches ignore the multigrained semantic knowledge learning from different modalities. To this end, this paper proposes a novel end-to-end cross-modal representation method, termed as Multi-Graph based Hierarchical Semantic Fusion (MG-HSF). This method is an integration of multi-graph hierarchical semantic fusion with cross-modal adversarial learning, which capture… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

4
5

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 24 publications
(30 reference statements)
0
4
0
Order By: Relevance
“…Cross-modal retrieval [22][23][24] is an important issue in the field of information retrieval and machine learning, as shown in Figure 1. It aims at retrieval or correlation matching between different types of data (such as images, text, audio, etc.…”
Section: Cross-modal Retrievalmentioning
confidence: 99%
“…Cross-modal retrieval [22][23][24] is an important issue in the field of information retrieval and machine learning, as shown in Figure 1. It aims at retrieval or correlation matching between different types of data (such as images, text, audio, etc.…”
Section: Cross-modal Retrievalmentioning
confidence: 99%
“…Multi-modal learning means that there are more than one source and form of data, and the process of learning in these forms is called multi-modal learning. Multi-modal learning can be divided into five categories: multi-modal representation learning (Zhang C. et al, 2021 ), modal transformation, alignment (Zhu et al, 2022 ), multi-modal fusion, and collaborative learning (Li et al, 2019 ). In this paper, because we use multi-modal feature selection algorithm, we focus on multi-modal feature selection in multi-modal representation learning.…”
Section: Related Workmentioning
confidence: 99%
“…This situation indicates that effectively enhancing intra-modal semantic alignment is significant for improving recipe retrieval performance. For this purpose, a straightforward method applied in lots of cross-modal retrieval tasks [46][47][48] is to utilize metric learning or contrastive learning strategy within each modality. However, there is a non-trivial issue, i.e., food image ambiguity, in cross-modal recipe retrieval that has not been considered.…”
Section: Introductionmentioning
confidence: 99%
“…For this purpose, a straightforward method applied in lots of cross-modal retrieval tasks [ 46 , 47 , 48 ] is to utilize metric learning or contrastive learning strategy within each modality. However, there is a non-trivial issue, i.e., food image ambiguity , in cross-modal recipe retrieval that has not been considered.…”
Section: Introductionmentioning
confidence: 99%