Proceedings of the 13th Annual ACM International Conference on Multimedia 2005
DOI: 10.1145/1101149.1101154
|View full text |Cite
|
Sign up to set email alerts
|

Joint visual-text modeling for automatic retrieval of multimedia documents

Abstract: In this paper we describe a novel approach for jointly modeling the text and the visual components of multimedia documents for the purpose of information retrieval(IR). We propose a novel framework where individual components are developed to model different relationships between documents and queries and then combined into a joint retrieval framework. In the state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
33
0
3

Year Published

2006
2006
2014
2014

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 37 publications
(37 citation statements)
references
References 18 publications
1
33
0
3
Order By: Relevance
“…One might expect that these words would be predicted entirely by co-occurrence probabilities with their counterparts. We find that the combined model using both text and vision almost always produces the best results, an observation that is shared with other works in image and video annotation [21,19] Table 2 Top-20 annotation accuracy of selected pairs of words which naturally co-occur together.…”
Section: Methodssupporting
confidence: 67%
“…One might expect that these words would be predicted entirely by co-occurrence probabilities with their counterparts. We find that the combined model using both text and vision almost always produces the best results, an observation that is shared with other works in image and video annotation [21,19] Table 2 Top-20 annotation accuracy of selected pairs of words which naturally co-occur together.…”
Section: Methodssupporting
confidence: 67%
“…The first one is a standard Corel image set which contains 5000 images widely used for comparing results. The second one is the large scale data set consisting of the entire TRECVID 2003 development dataset and feature set used by [11].…”
Section: Resultsmentioning
confidence: 99%
“…For comparison the best published retrieval results we know of on the same data set are 0.31 (SML [6]) and 0.30 (NCRM [14]). On a TRECVID3 dataset [11] the corresponding numbers are 0.152 and 0.158 for the diswhere they do not work well for long queries crete MRF model and the NCRM model respectively. The discrete MRF takes 90s for all queries while NCRM takes 6.8 hrs.…”
Section: Introductionmentioning
confidence: 99%
“…Iyengar et al [11] proposed a probabilistic model that relates words and image parts through an intermediate layer that captures common concepts. Models in this category usually rely on strong assumptions, e.g.…”
Section: Related Workmentioning
confidence: 99%