2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00446
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

Abstract: Thanks to the success of deep learning, cross-modal retrieval has made significant progress recently. However, there still remains a crucial bottleneck: how to bridge the modality gap to further enhance the retrieval accuracy. In this paper, we propose a self-supervised adversarial hashing (SSAH) approach, which lies among the early attempts to incorporate adversarial learning into cross-modal hashing in a self-supervised fashion. The primary contribution of this work is that two adversarial networks are lever… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
215
0
5

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 378 publications
(220 citation statements)
references
References 42 publications
(66 reference statements)
0
215
0
5
Order By: Relevance
“…The performance of DistillHash first increases and then keeps at a relatively high level. The result is also not sensitive to p in the range of [32,128] . For other experiments in this paper, we select p as 48.…”
Section: Parameter Sensitivitymentioning
confidence: 92%
“…The performance of DistillHash first increases and then keeps at a relatively high level. The result is also not sensitive to p in the range of [32,128] . For other experiments in this paper, we select p as 48.…”
Section: Parameter Sensitivitymentioning
confidence: 92%
“…On the other hand, supervised hashing methods [38], [39], [40], [41], [42] take full advantage of the label information to mitigate the semantic gap and improve the hashing quality, therefore attaining higher search accuracy than the unsupervised methods. In semantic correlation maximization hashing (SCMH) [39], semantic labels are merged into the hash learning procedure for large-scale data modeling.…”
Section: Cross-modal Hashingmentioning
confidence: 99%
“…Nagrani et al [37] demonstrated that a joint representation can be learned from facial and voice information and introduced a curriculum learning strategy [3,45,46] to perform hard negative mining during training. Text-to-image matching is a well-studied problem in computer vision [4,19,26,38,49,51,59,60,66,68] facilitated by datasets describing objects, birds, or flowers [29,44,67]. A relatively new application of text-to-image matching is person search the task of which is to retrieve the most relevant frames of an individual given a textual description as an input.…”
Section: Related Workmentioning
confidence: 99%