The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2021
DOI: 10.1109/tip.2020.3038354
|View full text |Cite
|
Sign up to set email alerts
|

Deep Relation Embedding for Cross-Modal Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 30 publications
(13 citation statements)
references
References 43 publications
1
12
0
Order By: Relevance
“…For MS-COCO dataset, the part of the description in the text is too ambiguous, which may cause the model to be insensitive to the features in this part and leads to poor results. But it is comparable to some other models, such as CPGN [15], CAMP [10], and PVSE [12].…”
Section: Quantitative Resultssupporting
confidence: 54%
See 3 more Smart Citations
“…For MS-COCO dataset, the part of the description in the text is too ambiguous, which may cause the model to be insensitive to the features in this part and leads to poor results. But it is comparable to some other models, such as CPGN [15], CAMP [10], and PVSE [12].…”
Section: Quantitative Resultssupporting
confidence: 54%
“…Sentence Retrieval Image Retrieval rSum R@1 R@5 R@10 R@1 R@5 R@10 CPGN [15] 70.5 91.2 94.9 50.3 77.7 85.2 469.8 IMRAM [18] 74.1 93.0 96.6 53.9 79.4 87.2 484.2 CAAN [17] 70.1 91.6 97.2 52.8 79.0 87.9 478.6 VSRN [13] 71.3 90.6 96.0 54.7 81.8 88.2 482.6 CAMP [10] 68.1 89.7 95.2 51.5 77.1 85.3 466.9 TERAN(single) [5] 75.8 93.2 96.7 59.5 84.9 90.6 500.7 TERAN(ens.) [5]…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The core idea of the cross‐media search method based on deep semantics is to learn the complex high‐level features by using deep learning methods to improve the effect of feature learning. Zhang et al 22 incorporate image and text into the latent common space using the Residual Network (ResNet) model to learn the global feature for sentence generation learning. Peng and Qi 23 used bidirectional translation training to directly convert the bidirectional pairs between visual and textual descriptions to capture the cross‐media correlations.…”
Section: Related Workmentioning
confidence: 99%