Automatic image annotation and retrieval using cross-media relevance models

Jeon, Jiwoon; Lavrenko, Victor; Manmatha, R.

doi:10.1145/860458.860459

Cited by 447 publications

(383 citation statements)

References 17 publications

Supporting

Mentioning

380

Contrasting

Unclassified

Order By: Relevance

“…For example, [14] extends and adapts the initial static image annotation approach presented in Jeon et al [22] to create what they call multiple bernoulli relevance models for image and video annotation. In this approach, a substantial time savings is realized by using a fixed sized grid for feature computations as opposed to relying on segmentations as in [22] and [10]. The fixed number of regions also simplifies parameter estimation in their underlying model and makes models of spatial context more straightforward.…”

Section: Adapting Methods For Static Imagery To Videomentioning

confidence: 99%

“…As we shall see shortly, when we seek to use a kernel density type of approach for extremely large datasets such as those produced by large video collections, we must use some intelligent data structures and potentially some approximations to keep computations tractable. The authors of [14] also argue that their underlying bernoulli model for annotations is more appropriate for image keyword annotations where words are not repeated compared to the multinomial assumptions used in their earlier work [22]. The experimental analysis of the multiple bernoulli model of [14] used a subset of the NIST Video Trec dataset [34].…”

Section: Adapting Methods For Static Imagery To Videomentioning

confidence: 99%

“…Their model was trained using 4500 Corel images where there are 371 words in total in the vocabulary and each image has 4-5 keywords. Jeon et al [22] used the same Corel data, word annotations and features used in [10]. They used this vocabulary of blobs to construct probabilistic models to predict the probability of generating a word given the blobs in an image.…”

Section: Image Features For Annotationmentioning

confidence: 99%

“…They also proposed a simple technique to combine distance computations to create a nearest neighbor classifier suitable for baseline experiments. Furthermore, they showed that this new baseline outperforms state of the art methods on the Corel standard including extensions of Jeon et al [22] such as [14]. The new baseline was also applied to the IAPR TC-12 [18] collection of 19,805 images of natural scenes with a dictionary of 291 words as well as 21,844 images from the ESP collaborative image labeling game [45].…”

Section: Image Features For Annotationmentioning

confidence: 99%

See 3 more Smart Citations

YouTube Scale, Large Vocabulary Video Annotation

Morsillo

Mann

Pal

2010

Video Search and Mining

View full text Add to dashboard Cite

As video content on the web continues to expand, it is increasingly important to properly annotate videos for effective search and mining. While the idea of annotating static imagery with keywords is relatively well known, the idea of annotating videos with natural language keywords to enhance search is an important emerging problem with great potential to improve the quality of video search. However, leveraging web-scale video datasets for automated annotation also presents new challenges and requires methods specialized for scalability and efficiency. In this chapter we review specific, state of the art techniques for video analysis, feature extraction and classification suitable for extremely large scale automated video annotation. We also review key algorithms and data structures that make truly large scale video search possible. Drawing from these observations and insights, we present a complete method for automatically augmenting keyword annotations to videos using previous annotations for a large collection of videos. Our approach is designed explicitly to scale to YouTube sized datasets and we present some experiments and analysis for keyword augmentation quality using a corpus of over 1.2 million YouTube videos. We demonstrate how the automated annotation of webscale video collections is indeed feasible, and that an approach combining visual features with existing textual annotations yields better results than unimodal models.

show abstract

Section: Adapting Methods For Static Imagery To Videomentioning

confidence: 99%

Section: Adapting Methods For Static Imagery To Videomentioning

confidence: 99%

Section: Image Features For Annotationmentioning

confidence: 99%

Section: Image Features For Annotationmentioning

confidence: 99%

See 2 more Smart Citations

YouTube Scale, Large Vocabulary Video Annotation

Morsillo

Mann

Pal

2010

Video Search and Mining

View full text Add to dashboard Cite

show abstract

“…The theory has been extensively studied in image retrieval [17][18][19] and structured document retrieval [20], but has never been applied in such a context.…”

Section: Indexing Modelmentioning

confidence: 99%

Investigation of the Effectiveness of Cross-Media Indexing

Yakici

Crestani

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Cross-media analysis and indexing leverages the individual potential of each indexing information provided by different modalities, such as speech, text and image, to improve the effectiveness of information retrieval and filtering in later stages. The process does not only constitute generating a merged representation of the digital content, such as MPEG-7, but also enriching it in order to help remedy the imprecision and noise introduced during the low-level analysis phases. It has been hypothesized that a system that combines different media descriptions of the same multi-modal audio-visual segment in a semantic space will perform better at retrieval and filtering time. In order to validate this hypothesis, we have developed a cross-media indexing system which utilises the Multiple Evidence approach by establishing links among the modality specific textual descriptions in order to depict topical similarity.

show abstract

References

2012

Multimedia Information Extraction

View full text Add to dashboard Cite

Automatic image annotation and retrieval using cross-media relevance models

Cited by 447 publications

References 17 publications

YouTube Scale, Large Vocabulary Video Annotation

YouTube Scale, Large Vocabulary Video Annotation

Investigation of the Effectiveness of Cross-Media Indexing

References

Contact Info

Product

Resources

About