2015 IEEE International Conference on Computer Vision (ICCV) 2015
DOI: 10.1109/iccv.2015.298
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Concept Discovery from Parallel Text and Visual Corpora

Abstract: Humans connect language and vision to perceive the world. How to build a similar connection for computers? One possible way is via visual concepts, which are text terms that relate to visually discriminative entities. We propose an automatic visual concept discovery algorithm using parallel text and visual corpora; it filters text terms based on the visual discriminative power of the associated images, and groups them into concepts using visual and semantic similarities. We illustrate the applications of the d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
59
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 96 publications
(59 citation statements)
references
References 29 publications
0
59
0
Order By: Relevance
“…These captions are called candidate captions. The captions for the query image are selected from these captions pool [47,55,108,130]. These methods produce general and syntactically correct captions.…”
Section: Image Captioning Methodsmentioning
confidence: 99%
“…These captions are called candidate captions. The captions for the query image are selected from these captions pool [47,55,108,130]. These methods produce general and syntactically correct captions.…”
Section: Image Captioning Methodsmentioning
confidence: 99%
“…Karpathy et al [16] propose a deep visualsemantic alignment (DVSA) model for image retrieval, which uses the BiLSTM to encode query features and R-CNN detector [11] to extract object representations. Sun et al [36] advise an automatic visual concept discovery algorithm to boost the performance of image retrieval. Moreover, Hu et al [15] and Mao et al [24] regard this problem as natural language object retrieval.…”
Section: Image/video Retrievalmentioning
confidence: 99%
“…Typically in these approaches, web-crawlers collect easily available noisy multi-modal data [8,12,79] or e-books [17] which is jointly processed for labelling and knowledge extraction. The features are used for diverse applications such as classification and retrieval [68,76] or product description generation [82].…”
Section: Related Workmentioning
confidence: 99%