Multi-Graph Based Hierarchical Semantic Fusion for Cross-Modal Representation

Zhu, Lei; Zhang, Chengyuan; Song, Jiayu; Liu, Liangchen; Zhang, Shichao; Li, Yangding

doi:10.1109/icme51207.2021.9428194

Cited by 17 publications

(6 citation statements)

References 24 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Cross-modal retrieval [22][23][24] is an important issue in the field of information retrieval and machine learning, as shown in Figure 1. It aims at retrieval or correlation matching between different types of data (such as images, text, audio, etc.…”

Section: Cross-modal Retrievalmentioning

confidence: 99%

Soft Contrastive Cross-Modal Retrieval

Song,

Hu,

Zhu

et al. 2024

Applied Sciences

Self Cite

View full text Add to dashboard Cite

Cross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, leading to less interpretable and potentially overfitting representations. Most existing methods realize outstanding results based on datasets without any error or noise, but that is extremely ideal and leads to trained models lacking robustness. To solve these problems, in this paper, we propose a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), which integrates the deep cross-modal model with soft contrastive learning and smooth label cross-entropy learning to boost common subspace embedding and improve the generalizability and robustness of the model. To confirm the performance and effectiveness of SCCMR, we conduct extensive experiments comparing 12 state-of-the-art methods on three multi-modal datasets by using image–text retrieval as a showcase. The experimental results show that our proposed method outperforms the baselines.

show abstract

Section: Cross-modal Retrievalmentioning

confidence: 99%

Soft Contrastive Cross-Modal Retrieval

Song,

Hu,

Zhu

et al. 2024

Applied Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…Multi-modal learning means that there are more than one source and form of data, and the process of learning in these forms is called multi-modal learning. Multi-modal learning can be divided into five categories: multi-modal representation learning (Zhang C. et al, 2021 ), modal transformation, alignment (Zhu et al, 2022 ), multi-modal fusion, and collaborative learning (Li et al, 2019 ). In this paper, because we use multi-modal feature selection algorithm, we focus on multi-modal feature selection in multi-modal representation learning.…”

Section: Related Workmentioning

confidence: 99%

Multi-modal feature selection with anchor graph for Alzheimer's disease

Xu²,

Yu³

et al. 2022

Front. Neurosci.

Self Cite

View full text Add to dashboard Cite

In Alzheimer's disease, the researchers found that if the patients were treated at the early stage of the disease, it could effectively delay the development of the disease. At present, multi-modal feature selection is widely used in the early diagnosis of Alzheimer's disease. However, existing multi-modal feature selection algorithms focus on learning the internal information of multiple modalities. They ignore the relationship between modalities, the importance of each modality and the local structure in the multi-modal data. In this paper, we propose a multi-modal feature selection algorithm with anchor graph for Alzheimer's disease. Specifically, we first use the least square loss and l2,1−norm to obtain the weight of the feature under each modality. Then we embed a modal weight factor into the objective function to obtain the importance of each modality. Finally, we use anchor graph to quickly learn the local structure information in multi-modal data. In addition, we also verify the validity of the proposed algorithm on the published ADNI dataset.

show abstract

“…This situation indicates that effectively enhancing intra-modal semantic alignment is significant for improving recipe retrieval performance. For this purpose, a straightforward method applied in lots of cross-modal retrieval tasks [46][47][48] is to utilize metric learning or contrastive learning strategy within each modality. However, there is a non-trivial issue, i.e., food image ambiguity, in cross-modal recipe retrieval that has not been considered.…”

Section: Introductionmentioning

confidence: 99%

“…For this purpose, a straightforward method applied in lots of cross-modal retrieval tasks [ 46 , 47 , 48 ] is to utilize metric learning or contrastive learning strategy within each modality. However, there is a non-trivial issue, i.e., food image ambiguity , in cross-modal recipe retrieval that has not been considered.…”

Section: Introductionmentioning

confidence: 99%

Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

Zou,

Zhu,

Zhu

et al. 2024

Foods

Self Cite

View full text Add to dashboard Cite

As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.

show abstract

Multi-Graph Based Hierarchical Semantic Fusion for Cross-Modal Representation

Cited by 17 publications

References 24 publications

Soft Contrastive Cross-Modal Retrieval

Soft Contrastive Cross-Modal Retrieval

Multi-modal feature selection with anchor graph for Alzheimer's disease

Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

Contact Info

Product

Resources

About