Adversarial Learning Based Semantic Correlation Representation for Cross-Modal Retrieval

Zhu, Lei; Song, Jiayu; Wei, Xiangxiang; Long, Jun

doi:10.20944/preprints202001.0288.v1

Cited by 6 publications

(8 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So, a modality-specific and shared generative adversarial network (MS 2 GAN) approach is proposed in [18] which incorporates two separate sub-networks and a common subnetwork for learning modality-specific and modality-shared features respectively. [19] has introduced a novel end-to-end framework known as adversarial learning based semantic correlation representation (ALSCOR) framework which combines cross-modal representation learning, adversarial, and correlation learning. Non-linear correlation is captured by integrating the CCA model with TxtNet and VisNet representation models.…”

Section: Generative Adversarial Networkmentioning

confidence: 99%

“…Table (5) shows the I2T, T2I and their average MAP score values for respective dataset categories. Figure (19) demonstrates a curve depicting the precision values obtained for each test query (image in case of I2T and text in case of T2I operation) in a sorted manner and the change in precision values as per the queries can be visualized. Figure (20) illustrates a few matched images and text results retrieved using an image query on trained Proposed2 model.…”

Section: Parameter Settingsmentioning

confidence: 99%

See 1 more Smart Citation

Hybrid SOM based cross-modal retrieval exploiting Hebbian learning

Kaur

Malhi

Pannu

2022

Knowledge-Based Systems

View full text Add to dashboard Cite

Section: Generative Adversarial Networkmentioning

confidence: 99%

Section: Parameter Settingsmentioning

confidence: 99%

Hybrid SOM based cross-modal retrieval exploiting Hebbian learning

Kaur

Malhi

Pannu

2022

Knowledge-Based Systems

View full text Add to dashboard Cite

“…As a hot issue widely concerned, cross-modal retrieval problem is studied by a growing number of researchers [4,5,25,29,35,40,50]. According to the representation type of multimedia instances, cross-modal retrieval can be divided into two groups: real-valued representation based retrieval and binary representation (hash code) based retrieval.…”

Section: Related Workmentioning

confidence: 99%

MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval

Zhu

Cai

Song

et al. 2022

Proceedings of the 2022 International Conference on Multimedia Retrieval

Self Cite

View full text Add to dashboard Cite

Cross-modal hashing is a hot issue in the multimedia community, which is to generate compact hash code from multimedia content for efficient cross-modal search. Two challenges, i.e., (1) How to efficiently enhance cross-modal semantic mining is essential for cross-modal hash code learning, and (2) How to combine multiple semantic correlations learning to improve the semantic similarity preserving, cannot be ignored. To this end, this paper proposed a novel end-to-end cross-modal hashing approach, named Multiple Semantic Structure-Preserving Quantization (MSSPQ) that is to integrate deep hashing model with multiple semantic correlation learning to boost hash learning performance. The multiple semantic correlation learning consists of inter-modal and intra-modal pairwise correlation learning and Cosine correlation learning, which can comprehensively capture cross-modal consistent semantics and realize semantic similarity preserving. Extensive experiments are conducted on three multimedia datasets, which confirms that the proposed method outperforms the baselines. CCS CONCEPTS• Information systems → Multimedia and multimodal retrieval.

show abstract

“…Building embeddings for different modalities in a common semantic space has been another popular way over the past few years. is method allows the model to compute cross-modal similarity, which can be further used for downstream tasks, such as cross-media retrieval [40][41][42]. Ba et al [43] presented a model that can classify unseen categories from their textual description by cross-modal similarity in Zero-Shot Learning (ZSL).…”

Section: Cross-modal Representationmentioning

confidence: 99%

PTF‐SimCM: A Simple Contrastive Model with Polysemous Text Fusion for Visual Similarity Metric

et al. 2022

View full text Add to dashboard Cite

Image similarity metric, also known as metric learning (ML) in computer vision, is a significant step in various advanced image tasks. Nevertheless, existing well-performing approaches for image similarity measurement only focus on the image itself without utilizing the information of other modalities, while pictures always appear with the described text. Furthermore, those methods need human supervision, yet most images are unlabeled in the real world. Considering the above problems comprehensively, we present a novel visual similarity metric model named PTF-SimCM. It adopts a self-supervised contrastive structure like SimSiam and incorporates a multimodal fusion module to utilize textual modality correlated to the image. We apply a cross-modal model for text modality rather than a standard unimodal text encoder to improve late fusion productivity. In addition, the proposed model employs Sentence PIE-Net to solve the issue caused by polysemous sentences. For simplicity and efficiency, our model learns a specific embedding space where distances directly correspond to the similarity. Experimental results on MSCOCO, Flickr 30k, and Pascal Sentence datasets show that our model overall outperforms all the compared methods in this work, which illustrates that the model can effectively address the issues faced and enhance the performances on unsupervised visual similarity measuring relatively.

show abstract

Adversarial Learning Based Semantic Correlation Representation for Cross-Modal Retrieval

Cited by 6 publications

References 40 publications

Hybrid SOM based cross-modal retrieval exploiting Hebbian learning

Hybrid SOM based cross-modal retrieval exploiting Hebbian learning

MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval

PTF‐SimCM: A Simple Contrastive Model with Polysemous Text Fusion for Visual Similarity Metric

Contact Info

Product

Resources

About