HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

Zhang, Chengyuan; Song, Jiayu; Zhu, Xiaofeng; Zhu, Lei; Zhang, Shichao

doi:10.1145/3412847

Cited by 29 publications

(15 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The purpose of cross-modal retrieval is to enable flexible retrieval across different modalities, e.g., retrieve semantically matching images for a given text query. Existing methods can be roughly divided into two categories, i.e., subspace learning methods 29 – 31 based on CCA 4 and supervised learning methods 2 , 6 , 6 , 32 based on DNNs.…”

Section: Related Workmentioning

confidence: 99%

Learning discriminative common alignments for cross-modal retrieval

Liu,

Chen,

Hong

et al. 2024

J. Electron. Imag.

View full text Add to dashboard Cite

Cross-modal retrieval aims to find alignment relationships between different modalities and then compute the semantic similarities used for ranking. Because of the data distribution difference and inherent heterogeneity gap between modalities, a classic solution is to learn common representations in the common space, which could preserve the discrimination among the samples from different categories and alleviate the cross-modal discrepancy. To achieve this, we propose a method, termed LDCA, to learn discriminative common alignments based on the modal representations. LDCA utilizes a modality invariance loss that pushes away the hardest negative sample to further reduce the cross-modal discrepancy at the feature level. In addition, LDCA seeks alignments in the label space to improve the intra-modal discrimination by an effective cross-modal label loss. Extensive experiments are conducted on five widely used cross-modal datasets to evaluate the proposed LDCA. The integral experimental results prove the method's superiority, and the comprehensive analyses verify the effectiveness of the method.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning discriminative common alignments for cross-modal retrieval

Liu,

Chen,

Hong

et al. 2024

J. Electron. Imag.

View full text Add to dashboard Cite

show abstract

“…As a hot issue widely concerned, cross-modal retrieval problem is studied by a growing number of researchers [4,5,25,29,35,40,50]. According to the representation type of multimedia instances, cross-modal retrieval can be divided into two groups: real-valued representation based retrieval and binary representation (hash code) based retrieval.…”

Section: Related Workmentioning

confidence: 99%

MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval

Zhu

Cai

Song

et al. 2022

Proceedings of the 2022 International Conference on Multimedia Retrieval

Self Cite

View full text Add to dashboard Cite

Cross-modal hashing is a hot issue in the multimedia community, which is to generate compact hash code from multimedia content for efficient cross-modal search. Two challenges, i.e., (1) How to efficiently enhance cross-modal semantic mining is essential for cross-modal hash code learning, and (2) How to combine multiple semantic correlations learning to improve the semantic similarity preserving, cannot be ignored. To this end, this paper proposed a novel end-to-end cross-modal hashing approach, named Multiple Semantic Structure-Preserving Quantization (MSSPQ) that is to integrate deep hashing model with multiple semantic correlation learning to boost hash learning performance. The multiple semantic correlation learning consists of inter-modal and intra-modal pairwise correlation learning and Cosine correlation learning, which can comprehensively capture cross-modal consistent semantics and realize semantic similarity preserving. Extensive experiments are conducted on three multimedia datasets, which confirms that the proposed method outperforms the baselines. CCS CONCEPTS• Information systems → Multimedia and multimodal retrieval.

show abstract

“…Multi-modal learning means that there are more than one source and form of data, and the process of learning in these forms is called multi-modal learning. Multi-modal learning can be divided into five categories: multi-modal representation learning (Zhang C. et al, 2021 ), modal transformation, alignment (Zhu et al, 2022 ), multi-modal fusion, and collaborative learning (Li et al, 2019 ). In this paper, because we use multi-modal feature selection algorithm, we focus on multi-modal feature selection in multi-modal representation learning.…”

Section: Related Workmentioning

confidence: 99%

Multi-modal feature selection with anchor graph for Alzheimer's disease

Xu²,

Yu³

et al. 2022

Front. Neurosci.

Self Cite

View full text Add to dashboard Cite

In Alzheimer's disease, the researchers found that if the patients were treated at the early stage of the disease, it could effectively delay the development of the disease. At present, multi-modal feature selection is widely used in the early diagnosis of Alzheimer's disease. However, existing multi-modal feature selection algorithms focus on learning the internal information of multiple modalities. They ignore the relationship between modalities, the importance of each modality and the local structure in the multi-modal data. In this paper, we propose a multi-modal feature selection algorithm with anchor graph for Alzheimer's disease. Specifically, we first use the least square loss and l2,1−norm to obtain the weight of the feature under each modality. Then we embed a modal weight factor into the objective function to obtain the importance of each modality. Finally, we use anchor graph to quickly learn the local structure information in multi-modal data. In addition, we also verify the validity of the proposed algorithm on the published ADNI dataset.

show abstract

HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

Cited by 29 publications

References 60 publications

Learning discriminative common alignments for cross-modal retrieval

Learning discriminative common alignments for cross-modal retrieval

MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval

Multi-modal feature selection with anchor graph for Alzheimer's disease

Contact Info

Product

Resources

About