Information-theoretic semantic multimedia indexing

Proceedings of the ACM International Conference on Image and Video Retrieval

2010

Self Cite

In this paper, we propose a direct image retrieval framework based on Markov Random Fields (MRFs) that exploits the semantic context dependencies of the image. The novelty of our approach lies in the use of different kernels in our non-parametric density estimation together with the utilisation of configurations that explore semantic relationships among concepts at the same time as low-level features, instead of just focusing on correlation between image features like in previous formulations. Hence, we introduce several configurations and study which one achieve the best performance. Results are presented for two datasets, the usual benchmark Corel 5k and the collection proposed by the 2009 edition of the ImageCLEF campaign. We observe that, using MRFs, performance increases significantly depending on the kernel used in the density estimation for the two datasets. With respect to the the language model, best results are obtained for the configuration that exploits dependencies between words together with dependencies between words and visual features. For the Corel 5k dataset, our best result corresponds to a mean average precision of 0.32, which compares favourably with the highest value ever obtained, 0.35, achieved by Makadia et al. [22] albeit with different features. For the ImageCLEF09 collection, we obtained 0.32, as mean average precision.

“…Magalhaes&Rüger [21] 0.28* Npde Yavlinsky et al [34] 0.29* MBRM Feng et al [10] 0.30 SML Carneiro et al [3] 0.31 JEC Makadia et al [22] 0.35…”

Section: Logregl2mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Image retrieval using Markov Random Fields and global image features

Llorente

Manmatha

Proceedings of the ACM International Conference on Image and Video Retrieval

2010

Self Cite

“…In [Magalhães and Rüger, 2007] we introduced an information-theoretic framework for Equation (5). The current paper proposes and presents a definitive and thorough account of our framework, [Magalhães, 2008].…”

Section: Organizationmentioning

confidence: 99%

An information-theoretic framework for semantic-multimedia retrieval

Magalhães

2010

ACM Trans. Inf. Syst.

Self Cite

This article is set in the context of searching text and image repositories by keyword. We develop a unified probabilistic framework for text, image, and combined text and image retrieval that is based on the detection of keywords (concepts) using automated image annotation technology. Our framework is deeply rooted in information theory and lends itself to use with other media types. We estimate a statistical model in a multimodal feature space for each possible query keyword. The key element of our framework is to identify feature space transformations that make them comparable in complexity and density. We select the optimal multimodal feature space with a minimum description length criterion from a set of candidate feature spaces that are computed with the average-mutual-information criterion for the text part and hierarchical expectation maximization for the visual part of the data. We evaluate our approach in three retrieval experiments (only text retrieval, only image retrieval, and text combined with image retrieval), verify the framework's low computational complexity, and compare with existing state-of-the-art ad-hoc models.

“…Several techniques to model a keyword with different types of probability density distributions have been used: Feng and Manmatha [4] proposed a Bernoulli model with a vocabulary of visual terms for each keyword, Yavlinsky et al [28] deployed nonparametric density estimation, Carneiro and Vasconcelos [1] a semi-parametric density estimation. Automatic multimedia keyword annotation has also been an active area of research: Snoek et al [24] explore temporal synchronization to combine the multi-modal patterns, Monay and Gatica-Perez explore dependencies across different media [17], while Magalhães and Rüger [15] developed a multimodal maximum entropy framework. The above methods extract features from the multimedia itself, but other, heuristic techniques rely on metadata attached to the multimedia: for example, Lu et al [13] analyse HTML text surrounding an image and assign the most relevant keywords to it.…”

Section: Systems Based On Automatic Abstractmentioning

confidence: 99%

Exploring multimedia in a keyword space

Magalhães

Ciravegna

Proceedings of the 16th ACM International Conference on Multimedia

2008

Self Cite

We address the problem of searching multimedia by semantic similarity in a keyword space. In contrast to previous research we represent multimedia content by a vector of keywords instead of a vector of low-level features. This vector of keywords can be obtained through user manual annotations or computed by an automatic annotation algorithm. In this setting, we studied the influence of two aspects of the search by semantic similarity process: (1) accuracy of user keywords versus automatic keywords and (2) functions to compute semantic similarity between keyword vectors of two multimedia documents. We consider these two aspects to be crucial in the design of a keyword space that can exploit social-media information and can enrich applications such as Flickr and YouTube. Experiments were performed on an image and a video dataset with a large number of keywords, with different similarity functions and with two annotation methods. Surprisingly, we found that multimedia semantic similarity with automatic keywords performs as good as or better than 95% accurate user keywords.