Proceedings of the 2nd ACM International Conference on Multimedia Retrieval 2012
DOI: 10.1145/2324796.2324842
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal feature generation framework for semantic image classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
8
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…In the early fusion methods [28], the features extracted from input data are firstly combined and then sent as input for annotation. In the late fusion methods [38,22], the local decisions are firstly obtained based on different modalities, then these decisions are combined for the final decision. The major disadvantage of multi-modal methods is that multimodal features are also required in the prediction process.…”
Section: Introductionmentioning
confidence: 99%
“…In the early fusion methods [28], the features extracted from input data are firstly combined and then sent as input for annotation. In the late fusion methods [38,22], the local decisions are firstly obtained based on different modalities, then these decisions are combined for the final decision. The major disadvantage of multi-modal methods is that multimodal features are also required in the prediction process.…”
Section: Introductionmentioning
confidence: 99%
“…Here, a noise filtering algorithm is necessarily developed to remove the irrelevant web texts for the BoW-based model. Since web resources have great reliability diversity, it may not be an optimal practice to allocate fixed weights to the visual feature-based and text feature-based classifiers as in [9][10][11]105]. In this chapter, an adaptive fusion algorithm is developed for the integration of the visual feature-based and web textual feature-based classification results.…”
Section: Motivationsmentioning
confidence: 99%
“…Different from homogeneous web data-aided approaches, heterogeneous web dataaided frameworks [9][10][11] have been developed to explore different modality data and facilitate image classification, such as image tags or descriptions in the form of short text. Compared to homogeneous frameworks, heterogeneous frameworks not only use the extra images that have the same feature representation for training, but also investigate different feature representations for the web text information.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…The results of these individual analyses are then fused together to decide which annotations are relevant to the input image. In [114], the authors adopted a late-fusion strategy in which they trained three SVM classifiers: one for a feature vector representing all visual features and two classifiers for two different representations of context data. The classifiers returns three scores that are then fed to a final SVM classifier.…”
Section: Combining Visual and Context Featuresmentioning
confidence: 99%