2020
DOI: 10.1109/tcyb.2018.2879846
|View full text |Cite
|
Sign up to set email alerts
|

MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval

Abstract: Cross-modal retrieval has drawn wide interest for retrieval across different modalities of data (such as text, image, video, audio and 3D model). However, existing methods based on deep neural network (DNN) often face the challenge of insufficient cross-modal training data, which limits the training effectiveness and easily leads to overfitting. Transfer learning is usually adopted for relieving the problem of insufficient training data, but it mainly focuses on knowledge transfer only from large-scale dataset… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
61
0
5

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 103 publications
(77 citation statements)
references
References 44 publications
0
61
0
5
Order By: Relevance
“…I→T I→A I→V T→I T→A T→V A→I A→T A→V V→I V→T V→A Average Our FGCrossNet 0.210 0.526 0.606 0.255 0.181 0.208 0.553 0.159 0.443 0.629 0.195 0.437 0.366 MHTN [20] 0.116 0.195 0.281 0.124 0.138 0.185 0.196 0.127 0.290 0.306 0.186 0.306 0.204 ACMR [21] 0.162 0.119 0.477 0.075 0.015 0.081 0.128 0.028 0.068 0.536 0.138 0.111 0.162 JRL [22] 0.160 0.085 0.435 0.190 0.028 0.095 0.115 0.035 0.065 0.517 0.126 0.068 0.160 GSPH [23] 0.140 0.098 0.413 0.179 0.024 0.109 0.129 0.024 0.073 0.512 0.126 0.086 0.159 CMDN [24] 0.099 0.009 0.377 0.123 0.007 0.078 0.017 0.008 0.010 0.446 0.081 0.009 0.105 SCAN [25] 0.050…”
Section: Methodsunclassified
“…I→T I→A I→V T→I T→A T→V A→I A→T A→V V→I V→T V→A Average Our FGCrossNet 0.210 0.526 0.606 0.255 0.181 0.208 0.553 0.159 0.443 0.629 0.195 0.437 0.366 MHTN [20] 0.116 0.195 0.281 0.124 0.138 0.185 0.196 0.127 0.290 0.306 0.186 0.306 0.204 ACMR [21] 0.162 0.119 0.477 0.075 0.015 0.081 0.128 0.028 0.068 0.536 0.138 0.111 0.162 JRL [22] 0.160 0.085 0.435 0.190 0.028 0.095 0.115 0.035 0.065 0.517 0.126 0.068 0.160 GSPH [23] 0.140 0.098 0.413 0.179 0.024 0.109 0.129 0.024 0.073 0.512 0.126 0.086 0.159 CMDN [24] 0.099 0.009 0.377 0.123 0.007 0.078 0.017 0.008 0.010 0.446 0.081 0.009 0.105 SCAN [25] 0.050…”
Section: Methodsunclassified
“…On the other hand, the cross-modal hashing methods mainly focus on the retrieval efficiency by mapping the items of different modalities into a common binary Hamming space. Benefited from the strong ability of distribution modeling and discriminative representation learning, some recent crossmodal retrieval methods have collaborated with GAN models [9,10,2]. In this work, our method also follows the similar adversarial learning framework that uses the single-modal similarities to guide the cross-modal representation learning.…”
Section: Related Workmentioning
confidence: 99%
“…The ACMR [2] method proposes the triplet loss and the modality classifier for preserving the modality level semantic struc-tures. The MHTN [10] is proposed to minimize the maximum mean discrepancy between modalities, which preserves more flexibility for the generator to project vectors into a new space. The difference between CMST and the previous work is that CMST can learn the item-level semantic relationships between unpaired items in an unsupervised way.…”
Section: Related Workmentioning
confidence: 99%
“…Until now, these embeddings have been learned in a static manner, i.e. without preserving the time dimension, and thus ignoring the evolution of modality interactions [5,7,12,20,21,23,27,30,31,35,36]. Approaches have ranged from solutions that organize the space according to linear correlations [23,33,36] (image and texts cooccurrence), semantic [20,30,34,35] (category information) and/or temporal correlations [27].…”
Section: Introductionmentioning
confidence: 99%