Proceedings of the 13th Annual ACM International Conference on Multimedia 2005
DOI: 10.1145/1101149.1101288
|View full text |Cite
|
Sign up to set email alerts
|

Learning the semantics of multimedia queries and concepts from a small number of examples

Abstract: In this paper we unify two supposedly distinct tasks in multimedia retrieval. One task involves answering queries with a few examples. The other involves learning models for semantic concepts, also with a few examples. In our view these two tasks are identical with the only differentiation being the number of examples that are available for training. Once we adopt this unified view, we then apply identical techniques for solving both problems and evaluate the performance using the NIST TRECVID benchmark evalua… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
92
0
4

Year Published

2006
2006
2019
2019

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 120 publications
(97 citation statements)
references
References 25 publications
(27 reference statements)
1
92
0
4
Order By: Relevance
“…The dataset consists of 308 video clips of vacation videos, with 19,238 extracted shots and representative keyframes. Clusters:We extract most descriptive visual features [6] from the keyframes. We use K-means algorithm to cluster the whole dataset into 100 visually distinctive clusters in the localized color space.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The dataset consists of 308 video clips of vacation videos, with 19,238 extracted shots and representative keyframes. Clusters:We extract most descriptive visual features [6] from the keyframes. We use K-means algorithm to cluster the whole dataset into 100 visually distinctive clusters in the localized color space.…”
Section: Methodsmentioning
confidence: 99%
“…We use K-means algorithm to cluster the whole dataset into 100 visually distinctive clusters in the localized color space. Localized color descriptor is extracted from a 5x5 image grid and is represented by the first 3 moments for each grid tile in the LAB color space [6].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the field of image/video annotation and retrieval, many researchers have simply used randomly selected unlabeled examples as negative examples [12,13,18,20]. However, there is no guarantee that the use of such negative examples will lead to accurate retrieval.…”
Section: Related Workmentioning
confidence: 99%
“…Usually, either features or modalities will be fused based on their ability to retrieve semantic concepts [17,38,39,43,145]. is built from visual and audio modalities, which is later partitioned into bi-modal words that can be also considered as joint patterns across modalities.…”
Section: What Should Be Fused?mentioning
confidence: 99%