MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval

Zhu, Lei; Cai, Liewu; Song, Jiayu; Zhu, Xinghui; Zhang, Chengyuan; Zhang, Shichao

doi:10.1145/3512527.3531417

Cited by 8 publications

(4 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Cross-modal hashing methods can be roughly divided into unsupervised [39], [40], [41] and supervised ones [42], [43], [44], [45]. Unsupervised methods typically leverage paired data to learn the correlations between modalities.…”

Section: Cross-modal Hashing Methodsmentioning

confidence: 99%

“…We compare the proposed SCH with some classical shallowfeature based baselines, including CVH [82], STMH [83], CMSSH [84], SCM [85], SePH [19], and several state-of-the-art end-to-end cross-modal hashing methods, including: DCMH [21], ATFH-N [86], CHN [87], SSAH [22], EGDH [12], AGAH [61], MSSPQ [45], HMAH [88], MAFH [89], MIAN [23]. To ensure fair comparisons with shallow-feature based baselines, we utilize 4096-dimensional image features extracted by the pre-trained VGG-19 network as the input.…”

Section: Compared Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Cross-Modal Hashing Method with Properties of Hamming Space: A New Perspective

Hu,

Cheung,

et al. 2024

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Cross-modal hashing (CMH) has attracted considerable attention in recent years. Almost all existing CMH methods primarily focus on reducing the modality gap and semantic gap, i.e., aligning multi-modal features and their semantics in Hamming space, without taking into account the space gap, i.e., difference between the real number space and the Hamming space. In fact, the space gap can affect the performance of CMH methods. In this paper, we analyze and demonstrate how the space gap affects the existing CMH methods, which therefore raises two problems: solution space compression and loss function oscillation. These two problems eventually cause the retrieval performance deteriorating. Based on these findings, we propose a novel algorithm, namely Semantic Channel Hashing (SCH). Firstly, we classify sample pairs into fully semantic-similar, partially semantic-similar, and semantic-negative ones based on their similarity and impose different constraints on them, respectively, to ensure that the entire Hamming space is utilized. Then, we introduce a semantic channel to alleviate the issue of loss function oscillation. Experimental results on three public datasets demonstrate that SCH outperforms the state-of-the-art methods. Furthermore, experimental validations are provided to substantiate the conjectures regarding solution space compression and loss function oscillation, offering visual evidence of their impact on the CMH methods. Codes are available at https://github.com/hutt94/SCH.

show abstract

Section: Cross-modal Hashing Methodsmentioning

confidence: 99%

Section: Compared Methodsmentioning

confidence: 99%

Cross-Modal Hashing Method with Properties of Hamming Space: A New Perspective

Hu,

Cheung,

et al. 2024

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…With the rapid development of multimedia technology, tasks such as image processing and recognizing, 8 – 10 cross-modal learning, 1 , 2 , 11 – 17 and multi-modal/media analysis 18 – 23 using multimodal and multiview data 24 – 28 are becoming more and more popular. In this section, we mainly discuss related work about cross-modal retrieval.…”

Section: Related Workmentioning

confidence: 99%

Learning discriminative common alignments for cross-modal retrieval

Liu,

Chen,

Hong

et al. 2024

J. Electron. Imag.

View full text Add to dashboard Cite

Cross-modal retrieval aims to find alignment relationships between different modalities and then compute the semantic similarities used for ranking. Because of the data distribution difference and inherent heterogeneity gap between modalities, a classic solution is to learn common representations in the common space, which could preserve the discrimination among the samples from different categories and alleviate the cross-modal discrepancy. To achieve this, we propose a method, termed LDCA, to learn discriminative common alignments based on the modal representations. LDCA utilizes a modality invariance loss that pushes away the hardest negative sample to further reduce the cross-modal discrepancy at the feature level. In addition, LDCA seeks alignments in the label space to improve the intra-modal discrimination by an effective cross-modal label loss. Extensive experiments are conducted on five widely used cross-modal datasets to evaluate the proposed LDCA. The integral experimental results prove the method's superiority, and the comprehensive analyses verify the effectiveness of the method.

show abstract

“…The main challenge of cross-modal recipe retrieval is mit-igating the heterogeneity between food images and recipes, which is more difficult than conventional image-text retrieval task [10]- [16], to some extent, due to more intricate data. The usual treatment is employing independent neural networks to encode images and their corresponding recipes so as to align them in a common feature subspace.…”

Section: Introductionmentioning

confidence: 99%

CREAMY: Cross-Modal Recipe Retrieval By Avoiding Matching Imperfectly

Zou,

Zhu,

Zhu

et al. 2024

IEEE Access

Self Cite

View full text Add to dashboard Cite

State-of-the-art methods for cross-modal recipe retrieval failed to consider an underlying but challenging issue, i.e., matching imperfectly problem hidden in positive image-recipe pairs, which is a culprit causing over-fitting. To make up this defect, two critical questions-how to effectively recognize and filter out mismatching parts during the model training and how to pick out and preserve as much matching information as possible need to be answered. To do so, this article proposes a novel method-Cross-modal Recipe rEtrieval by Avoiding Matching imperfectlY, abbreviated as CREAMY, which involving a new-designed learning strategy called Non-Matching and Partial-Matching (NMPM) to undertake two tasks: (1) no longer forcibly aligning each positive image-recipe pair but rather capturing the complementary information from negative pairs; (2) delicately picking up and aligning the matchable part in each pair. To the best of our knowledge, this attempt is a pioneer to defeat the matching imperfectly issue for cross-modal recipe retrieval task. Empirical analysis conducted on Recipe1M dataset validates the advantages of CREAMY over several state-of-the-arts. The code is available at: https://github.com/users/pouqual/CREAMY.

show abstract

MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval

Cited by 8 publications

References 50 publications

Cross-Modal Hashing Method with Properties of Hamming Space: A New Perspective

Cross-Modal Hashing Method with Properties of Hamming Space: A New Perspective

Learning discriminative common alignments for cross-modal retrieval

CREAMY: Cross-Modal Recipe Retrieval By Avoiding Matching Imperfectly

Contact Info

Product

Resources

About