Proceedings of the 9th International Workshop on Multimedia Data Mining: Held in Conjunction With the ACM SIGKDD 2008 2008
DOI: 10.1145/1509212.1509214
|View full text |Cite
|
Sign up to set email alerts
|

Combining image captions and visual analysis for image concept classification

Abstract: We present a framework for efficiently exploiting free-text annotations as a complementary resource to image classification. A novel approach called Semantic Concept Mapping (SCM) is used to classify entities occurring in the text to a custom-defined set of concepts. SCM performs unsupervised classification by exploiting the relations between common entities codified in the Wordnet thesaurus. SCM exploits Targeted Hypernym Discovery (THD) to map unknown entities extracted from the text to concepts in Wordnet. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2012
2012
2015
2015

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(19 citation statements)
references
References 20 publications
0
19
0
Order By: Relevance
“…These are extensions of the classic unimodal systems, where a common retrieval system integrates information from various modalities. This can be done by fusing features from different modalities into a single vector [37], [38], [39], or by learning different models for different modalities and fusing their predictions [40], [41]. One popular approach is to concatenate features from different modalities and rely on unsupervised structure discovery algorithms, such as latent semantic analysis, to find multimodal statistical regularities.…”
Section: Previous Workmentioning
confidence: 99%
“…These are extensions of the classic unimodal systems, where a common retrieval system integrates information from various modalities. This can be done by fusing features from different modalities into a single vector [37], [38], [39], or by learning different models for different modalities and fusing their predictions [40], [41]. One popular approach is to concatenate features from different modalities and rely on unsupervised structure discovery algorithms, such as latent semantic analysis, to find multimodal statistical regularities.…”
Section: Previous Workmentioning
confidence: 99%
“…In essence, the entity names and their types are described as vectors with the specified features. In Semantic Concept Mapping, with the known list of candidate entity names and labels are denoted as WordNet synsets [19]. The Lin's similarity function describes the type of entity name [25].…”
Section: Related Workmentioning
confidence: 99%
“…(19) Given two datasets A and B with target paired values of labeled data samples, the solutions to f and g of Equation 14 and Equation 19 can be used to estimate coordinates of the other unlabeled data points in intermediate spaces, which further can be utilized to align their intrinsic data manifolds.…”
Section: Parallel Field Alignment Retrievalmentioning
confidence: 99%
“…To address the cross media retrieval problem, advances have been reported over the last decades [7,26,28]. These methods focus on two traditional ways to design cross media retrieval systems: (a) fusing features from different media data into a single vector [23,33]; (b) learning different models for different media data and fusing their outputs [14,32]. And most of these approaches require multiple-type queries, e.g., queries composed of both image and text features.…”
Section: Introductionmentioning
confidence: 99%