2010
DOI: 10.1504/ijmis.2010.035970
|View full text |Cite
|
Sign up to set email alerts
|

Cross-media retrieval: state-of-the-art and open issues

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(6 citation statements)
references
References 52 publications
0
5
0
Order By: Relevance
“…Among these cross-modal techniques, cross-modal subspace learning methods have achieved state-of-the-art results in recent years [24,40,43,51,52], which have borrowed much inspiration from the conventional subspace approaches [53,54,55,56,57,58,59]. For a comprehensive survey, please refer to [60,61].…”
Section: Introductionmentioning
confidence: 99%
“…Among these cross-modal techniques, cross-modal subspace learning methods have achieved state-of-the-art results in recent years [24,40,43,51,52], which have borrowed much inspiration from the conventional subspace approaches [53,54,55,56,57,58,59]. For a comprehensive survey, please refer to [60,61].…”
Section: Introductionmentioning
confidence: 99%
“…where X (1) and X (2) represent two modalities of data, and V represents the latent semantic representations. P (1) and P (2) are the learned projections.…”
Section: Linear Modelingmentioning
confidence: 99%
“…This paper aims to conduct a comprehensive survey of cross-modal retrieval. Although Liu et al [1] gave an overview of cross-modal retrieval in 2010, it does not include many important works proposed in recent years. Xu et al [2] summarize several methods for modeling multimodal data, but they focus on multi-view learning.…”
Section: Introductionmentioning
confidence: 99%
“…The common solution to understand the relationship between image and text is to map the visual semantic embeddings 9, 10 of an image and the corresponding words, phrases, sentences into a common latent embedding space 9,[11][12][13][14][15][16] . In these methods, the goal is generally to find a common space in which the corresponding representations of image-text pairs are as close as possible, hence making the recognition of their relationship easier.…”
Section: Introductionmentioning
confidence: 99%