Proceedings of the Eleventh ACM International Conference on Multimedia 2003
DOI: 10.1145/957013.957065
|View full text |Cite
|
Sign up to set email alerts
|

Discriminative model fusion for semantic concept detection and annotation in video

Abstract: In this paper we describe a general information fusion algorithm that can be used to incorporate multimodal cues in building user-defined semantic concept models. We compare this technique with a Bayesian Network-based approach on a semantic concept detection task. Results indicate that this technique yields superior performance. We demonstrate this approach further by building classifiers of arbitrary concepts in a score space defined by a pre-deployed set of multimodal concepts. Results show annotation for u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
23
0
1

Year Published

2006
2006
2016
2016

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 42 publications
(24 citation statements)
references
References 9 publications
0
23
0
1
Order By: Relevance
“…The subset is used in the evaluation of high-level feature extraction task in TRECVID 2006 2 . The names and the identity number (ID) of the semantic concepts are listed in Table 1.…”
Section: Extracting Concept Associationmentioning
confidence: 99%
See 3 more Smart Citations
“…The subset is used in the evaluation of high-level feature extraction task in TRECVID 2006 2 . The names and the identity number (ID) of the semantic concepts are listed in Table 1.…”
Section: Extracting Concept Associationmentioning
confidence: 99%
“…Thus, N classifier are trained, each for one concept. Following the terms in [2], each classifier is treated as a basis model which plays a similar role as the eigenvector in the eigenspace, and it maps the low-level feature into one component in model score space. Traditionally, the N basis models are equally treated, and the feature dimension in the MBT fusion will be N. In Figure 1, the association of a target concept with other concepts varies in a large range from one concept to another.…”
Section: Exploiting Concept Association For Indexingmentioning
confidence: 99%
See 2 more Smart Citations
“…Существует множество приложений, в которых производится объединение аудио и видео, такие как распознавание речи [7][8][9][10][11][12][13], распознавание диктора [14][15][16], биометрическая верификация [17][18][19][20][21], обнаружение события [22][23][24][25], слежение за человеком или объектом [26][27][28][29][30][31], локализация и слежение за активным диктором [32,33], анализ музыкального контента, распознавание эмоций, видеопоиск, челове-ко-машинное взаимодействие, обнаружение голосовой активности и разделение источников звукового сигнала [34][35][36]. Очевидно, что в некоторых приложениях используются изображения лиц, а иногда даже движения всего тела, а не только лица.…”
Section: Introductionunclassified