Proceedings of the 2022 International Conference on Multimedia Retrieval 2022
DOI: 10.1145/3512527.3531417
|View full text |Cite
|
Sign up to set email alerts
|

MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval

Abstract: Cross-modal hashing is a hot issue in the multimedia community, which is to generate compact hash code from multimedia content for efficient cross-modal search. Two challenges, i.e., (1) How to efficiently enhance cross-modal semantic mining is essential for cross-modal hash code learning, and (2) How to combine multiple semantic correlations learning to improve the semantic similarity preserving, cannot be ignored. To this end, this paper proposed a novel end-to-end cross-modal hashing approach, named Multipl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 50 publications
0
4
0
Order By: Relevance
“…Cross-modal hashing methods can be roughly divided into unsupervised [39], [40], [41] and supervised ones [42], [43], [44], [45]. Unsupervised methods typically leverage paired data to learn the correlations between modalities.…”
Section: Cross-modal Hashing Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Cross-modal hashing methods can be roughly divided into unsupervised [39], [40], [41] and supervised ones [42], [43], [44], [45]. Unsupervised methods typically leverage paired data to learn the correlations between modalities.…”
Section: Cross-modal Hashing Methodsmentioning
confidence: 99%
“…We compare the proposed SCH with some classical shallowfeature based baselines, including CVH [82], STMH [83], CMSSH [84], SCM [85], SePH [19], and several state-of-the-art end-to-end cross-modal hashing methods, including: DCMH [21], ATFH-N [86], CHN [87], SSAH [22], EGDH [12], AGAH [61], MSSPQ [45], HMAH [88], MAFH [89], MIAN [23]. To ensure fair comparisons with shallow-feature based baselines, we utilize 4096-dimensional image features extracted by the pre-trained VGG-19 network as the input.…”
Section: Compared Methodsmentioning
confidence: 99%
“…With the rapid development of multimedia technology, tasks such as image processing and recognizing, 8 10 cross-modal learning, 1 , 2 , 11 17 and multi-modal/media analysis 18 23 using multimodal and multiview data 24 28 are becoming more and more popular. In this section, we mainly discuss related work about cross-modal retrieval.…”
Section: Related Workmentioning
confidence: 99%
“…The main challenge of cross-modal recipe retrieval is mit-igating the heterogeneity between food images and recipes, which is more difficult than conventional image-text retrieval task [10]- [16], to some extent, due to more intricate data. The usual treatment is employing independent neural networks to encode images and their corresponding recipes so as to align them in a common feature subspace.…”
Section: Introductionmentioning
confidence: 99%