With the explosive growth of the World Wide Web, the public is gaining access to massive amounts of information. However, locating needed and relevant information remains a difficult task, whether the information is textual or visual. Text search engines have existed for some years now and have achieved a certain degree of success. However, despite the large number of images available on the Web, image search engines are still rare. In this article, we show that in order to allow people to profit from all this visual information, there is a need to develop tools that help them to locate the needed images with good precision in a reasonable time, and that such tools are useful for many applications and purposes. The article surveys the main characteristics of the existing systems most often cited in the literature, such as ImageRover, WebSeek, Diogenes, and Atlas WISE. It then examines the various issues related to the design and implementation of a Web image search engine, such as data gathering and digestion, indexing, query specification, retrieval and similarity, Web coverage, and performance evaluation. A general discussion is given for each of these issues, with examples of the ways they are addressed by existing engines, and 130 related references are given. Some concluding remarks and directions for future research are also presented.
This paper presents an unsupervised approach for feature selection and extraction in mixtures of generalized Dirichlet (GD) distributions. Our method defines a new mixture model that is able to extract independent and non-Gaussian features without loss of accuracy. The proposed model is learned using the Expectation-Maximization algorithm by minimizing the message length of the data set. Experimental results show the merits of the proposed methodology in the categorization of object images.
This paper presents an unsupervised algorithm for learning a finite mixture model from multivariate data. This mixture model is based on the Dirichlet distribution, which offers high flexibility for modeling data. The proposed approach for estimating the parameters of a Dirichlet mixture is based on the maximum likelihood (ML) and Fisher scoring methods. Experimental results are presented for the following applications: estimation of artificial histograms, summarization of image databases for efficient retrieval, and human skin color modeling and its application to skin detection in multimedia databases.
We consider the problem of determining the structure of high-dimensional data, without prior knowledge of the number of clusters. Data are represented by a finite mixture model based on the generalized Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. This makes the generalized Dirichlet distribution more practical and useful. An important problem in mixture modeling is the determination of the number of clusters. Indeed, a mixture with too many or too few components may not be appropriate to approximate the true model. Here, we consider the application of the minimum message length (MML) principle to determine the number of clusters. The MML is derived so as to choose the number of clusters in the mixture model which best describes the data. A comparison with other selection criteria is performed. The validation involves synthetic data, real data clustering, and two interesting real applications: classification of web pages, and texture database summarization for efficient retrieval.
This paper deals with a Bayesian analysis of a finite Beta mixture model. We present approximation method to evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. Experimental results concern contextual and non-contextual evaluations. The non-contextual evaluation is based on synthetic histograms, while the contextual one model the class-conditional densities of pattern-recognition data sets. The Beta mixture is also applied to estimate the parameters of SAR images histograms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.