2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00677
|View full text |Cite
|
Sign up to set email alerts
|

Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
63
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 114 publications
(64 citation statements)
references
References 29 publications
0
63
1
Order By: Relevance
“…In most configurations, CSLS is slightly better than IS on improving text→image inference while IS is better at image→text. The best results (line 3.8, 3.9) are even better than the recently reported state-ofthe-art (Wu et al, 2019) (Table 4 line 3.14), which performs a naive nearest neighbor search. This suggests that the hubness problem deserves much more attention and careful selection of inference methods is vital for text-image matching.…”
Section: Hubs During Inferencementioning
confidence: 71%
“…In most configurations, CSLS is slightly better than IS on improving text→image inference while IS is better at image→text. The best results (line 3.8, 3.9) are even better than the recently reported state-ofthe-art (Wu et al, 2019) (Table 4 line 3.14), which performs a naive nearest neighbor search. This suggests that the hubness problem deserves much more attention and careful selection of inference methods is vital for text-image matching.…”
Section: Hubs During Inferencementioning
confidence: 71%
“…Learning visually grounded semantics to facilitate cross-modal retrieval (i.e., image-to-text and textto-image) is a challenging task for cross-modal learning (Faghri et al, 2018;Wu et al, 2019). Different from image captioning tasks, radiology reports are often longer and consist of multiple sentences, each related to different abnormal findings; meanwhile, there are fewer distinct objects in radiology images and the differences among images are more subtle.…”
Section: Visual-semantic Embeddings For Cross-modal Retrievalmentioning
confidence: 99%
“…The core issue of most existing studies [9], [26], [27], [33], [40], [42] for image-text matching can summarized as learning the joint representations for both modalities.…”
Section: Related Work a Image-text Matchingmentioning
confidence: 99%