Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval - SIGIR 2003
DOI: 10.1145/860458.860459
|View full text |Cite
|
Sign up to set email alerts
|

Automatic image annotation and retrieval using cross-media relevance models

Abstract: Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
380
0
1

Year Published

2008
2008
2012
2012

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 447 publications
(383 citation statements)
references
References 17 publications
2
380
0
1
Order By: Relevance
“…For example, [14] extends and adapts the initial static image annotation approach presented in Jeon et al [22] to create what they call multiple bernoulli relevance models for image and video annotation. In this approach, a substantial time savings is realized by using a fixed sized grid for feature computations as opposed to relying on segmentations as in [22] and [10]. The fixed number of regions also simplifies parameter estimation in their underlying model and makes models of spatial context more straightforward.…”
Section: Adapting Methods For Static Imagery To Videomentioning
confidence: 99%
See 3 more Smart Citations
“…For example, [14] extends and adapts the initial static image annotation approach presented in Jeon et al [22] to create what they call multiple bernoulli relevance models for image and video annotation. In this approach, a substantial time savings is realized by using a fixed sized grid for feature computations as opposed to relying on segmentations as in [22] and [10]. The fixed number of regions also simplifies parameter estimation in their underlying model and makes models of spatial context more straightforward.…”
Section: Adapting Methods For Static Imagery To Videomentioning
confidence: 99%
“…As we shall see shortly, when we seek to use a kernel density type of approach for extremely large datasets such as those produced by large video collections, we must use some intelligent data structures and potentially some approximations to keep computations tractable. The authors of [14] also argue that their underlying bernoulli model for annotations is more appropriate for image keyword annotations where words are not repeated compared to the multinomial assumptions used in their earlier work [22]. The experimental analysis of the multiple bernoulli model of [14] used a subset of the NIST Video Trec dataset [34].…”
Section: Adapting Methods For Static Imagery To Videomentioning
confidence: 99%
See 2 more Smart Citations
“…The theory has been extensively studied in image retrieval [17][18][19] and structured document retrieval [20], but has never been applied in such a context.…”
Section: Indexing Modelmentioning
confidence: 99%