Multi-label Cross-Modal Retrieval

Ranjan, Viresh; Rasiwasia, Nikhil; Jawahar, C. V.

doi:10.1109/iccv.2015.466

Cited by 195 publications

(91 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Similar phenomenon can also be observed for other extensions of harmonized GPLVM models. Compared to existing subspace learning approaches [29], [64] that usually fix the dimensionality of the common space to be 10 as reported in their papers, we obtain a lower dimensional embedding to summarize the high dimensional data, which also shows the remarkable representation learning ability of our non-linear nonparametric model. Thus we can improve the efficiency of our model with latent embeddings of lower dimensionality.…”

Section: Dimensionality Of the Latent Spacementioning

confidence: 81%

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Song

Wang

Huang

2021

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Multimodal learning aims to discover the relationship between multiple modalities. It has become an important research topic due to extensive multimodal applications such as cross-modal retrieval. This paper attempts to address the modality heterogeneity problem based on Gaussian process latent variable models (GPLVMs) to represent multimodal data in a common space. Previous multimodal GPLVM extensions generally adopt individual learning schemes on latent representations and kernel hyperparameters, which ignore their intrinsic relationship. To exploit strong complementarity among different modalities and GPLVM components, we develop a novel learning scheme called Harmonization, where latent model parameters are jointly learned from each other. Beyond the correlation fitting or intra-modal structure preservation paradigms widely used in existing studies, the harmonization is derived in a model-driven manner to encourage the agreement between modality-specific GP kernels and the similarity of latent representations. We present a range of multimodal learning models by incorporating the harmonization mechanism into several representative GPLVM-based approaches. Experimental results on four benchmark datasets show that the proposed models outperform the strong baselines for cross-modal retrieval tasks, and that the harmonized multimodal learning method is superior in discovering semantically consistent latent representation.

show abstract

Section: Dimensionality Of the Latent Spacementioning

confidence: 81%

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Song

Wang

Huang

2021

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…In a pioneering work [20], Canonical Correlation Analysis [8] (CCA) was used to learn linear projections for each modality, by learning a set of canonical coefficients, that define a subspace where modalities are maximally correlated. This approach was extended for the multi-label scenario, by using label information to establish correspondences between instances [18]. A multi-view kernel CCA formulation is proposed in [4], where a joint space for visual, textual and semantic information is learned.…”

Section: Related Workmentioning

confidence: 99%

Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints

Semedo

Magalhães

2019

Proceedings of the 27th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Cross-modal embeddings, between textual and visual modalities, aim to organise multimodal instances by their semantic correlations. State-of-the-art approaches use maximum-margin methods, based on the hinge-loss, to enforce a constant margin m, to separate projections of multimodal instances from different categories. In this paper, we propose a novel scheduled adaptive maximum-margin (SAM) formulation that infers triplet-specific constraints during training, therefore organising instances by adaptively enforcing inter-category and inter-modality correlations. This is supported by a scheduled adaptive margin function, that is smoothly activated, replacing a static margin by an adaptively inferred one reflecting triplet-specific semantic correlations while accounting for the incremental learning behaviour of neural networks to enforce category cluster formation and enforcement. Experiments on widely used datasets show that our model improved upon state-of-the-art approaches, by achieving a relative improvement of up to ≈ 12.5% over the second best method, thus confirming the effectiveness of our scheduled adaptive margin formulation.

show abstract

“…First, these methods simply and directly adopt single-class labels to measure the semantic relevance across modalities [9] [12]. In fact, in standard cross-modal benchmark datasets such as NUS-WIDE [6] and Microsoft COCO [15], an image instance can be assigned to multiple category labels [27], which is beneficial as it permits semantic relevance to be described more accurately across different modalities. Second, these methods enforce a narrowing of the modality gap by constraining the corresponding hash codes with certain pre-defined loss functions [4].…”

Section: Introductionmentioning

confidence: 99%

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

Deng

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

353

206

View full text Add to dashboard Cite

Thanks to the success of deep learning, cross-modal retrieval has made significant progress recently. However, there still remains a crucial bottleneck: how to bridge the modality gap to further enhance the retrieval accuracy. In this paper, we propose a self-supervised adversarial hashing (SSAH) approach, which lies among the early attempts to incorporate adversarial learning into cross-modal hashing in a self-supervised fashion. The primary contribution of this work is that two adversarial networks are leveraged to maximize the semantic correlation and consistency of the representations between different modalities. In addition, we harness a self-supervised semantic network to discover high-level semantic information in the form of multi-label annotations. Such information guides the feature learning process and preserves the modality relationships in both the common semantic space and the Hamming space. Extensive experiments carried out on three benchmark datasets validate that the proposed SSAH surpasses the state-of-the-art methods.

show abstract

Multi-label Cross-Modal Retrieval

Cited by 195 publications

References 26 publications

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

Contact Info

Product

Resources

About