Learning the semantics of multimedia queries and concepts from a small number of examples

Natsev, Apostol; Naphade, Milind; Tešić, Jelena

doi:10.1145/1101149.1101288

Cited by 120 publications

(97 citation statements)

References 25 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The dataset consists of 308 video clips of vacation videos, with 19,238 extracted shots and representative keyframes. Clusters:We extract most descriptive visual features [6] from the keyframes. We use K-means algorithm to cluster the whole dataset into 100 visually distinctive clusters in the localized color space.…”

Section: Methodsmentioning

confidence: 99%

“…We use K-means algorithm to cluster the whole dataset into 100 visually distinctive clusters in the localized color space. Localized color descriptor is extracted from a 5x5 image grid and is represented by the first 3 moments for each grid tile in the LAB color space [6].…”

Section: Methodsmentioning

confidence: 99%

“…Such inductive inference of concept from a cluster, describes the content of that cluster well. We extract most descriptive visual features [6] from the representative keyframes. We associate score with each data item for each semantic concept.…”

Section: Cluster Labelingmentioning

confidence: 99%

See 2 more Smart Citations

Semantic Labeling of Multimedia Content Clusters

Tešić

Smith

2006

2006 IEEE International Conference on Multimedia and Expo

View full text Add to dashboard Cite

In this paper we present a novel approach for labeling clusters of multimedia content that leverages supervised classification techniques in conjunction with unsupervised clustering. Recent research has produced significant results for automatic tagging of video content such as broadcast news. For example, powerful techniques have been demonstrated in the context of the NIST TRECVID video retrieval benchmark [1]. However, the information needs of users typically span a range of semantic concepts. One of the challenges of these multimedia retrieval systems is to organize the video data in such a way that allows the user to most efficiently navigate the semantic space for the video data set. One important tool for video data organization is clustering. However, clustering results cannot be leveraged effectively when they are not labeled. We propose to build on clustering by aggregating the automatically tagged semantics. We propose and compare four techniques for labeling the clusters and evaluate the performance compared to human labeled ground-truth. We present examples of the cluster labeling results obtained on the BBC stock shots from the TRECVID-2005 video data set.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Semantic Labeling of Multimedia Content Clusters

Tešić

Smith

2006

2006 IEEE International Conference on Multimedia and Expo

View full text Add to dashboard Cite

show abstract

“…In the field of image/video annotation and retrieval, many researchers have simply used randomly selected unlabeled examples as negative examples [12,13,18,20]. However, there is no guarantee that the use of such negative examples will lead to accurate retrieval.…”

Section: Related Workmentioning

confidence: 99%

Hybrid negative example selection using visual and conceptual features

Shirahama

Matsuoka

Uehara

2011

Multimed Tools Appl

View full text Add to dashboard Cite

An application of Query-By-Example (QBE) is presented where shots that are visually similar to provided example shots are retrieved. To implement QBE, counterexample shots are required to accurately distinguish shots that are relevant to the query from those that are not Snoek (2009), Yu et al. (2004)). However, there are usually a huge number of shots, not relevant to a particular query, which can serve as counterexample shots. It is difficult for a user to provide counter-example shots that would aid retrieval. Thus, we developed a QBE method based on partially supervised learning where a retrieval model is constructed by selecting counter-example shots from shots without user supervision. To ensure the speed and accuracy of the QBE method, we select a small number of counter-example shots that are visually similar to given example shots but irrelevant to the query. Such shots are useful for characterizing the boundary between relevant and irrelevant shots. For our method, we first filter shots that are visually dissimilar to example shots based on SVMs on a visual feature. Then we filter shots relevant to the query based on concept detection results from pre-constructed classifiers. Shots that pass the above two tests are considered as counter-example shots. Experimental results obtained using TRECVID 2009 video data validate the effectiveness of our method.

show abstract

“…Usually, either features or modalities will be fused based on their ability to retrieve semantic concepts [17,38,39,43,145]. is built from visual and audio modalities, which is later partitioned into bi-modal words that can be also considered as joint patterns across modalities.…”

Section: What Should Be Fused?mentioning

confidence: 99%

Integrating Deep Learning with Correlation-based Multimedia Semantic Concept Detection

Ha¹

View full text Add to dashboard Cite

Learning the semantics of multimedia queries and concepts from a small number of examples

Cited by 120 publications

References 25 publications

Semantic Labeling of Multimedia Content Clusters

Semantic Labeling of Multimedia Content Clusters

Hybrid negative example selection using visual and conceptual features

Integrating Deep Learning with Correlation-based Multimedia Semantic Concept Detection

Contact Info

Product

Resources

About