In this paper, we study the peaky nature of wavelet coefficient distributions. The study shows that the wavelet coefficients cannot be effectively modeled by a single distribution. We then propose a new modeling scheme based on a Laplacian mixture model and apply it to the indexing and retrieval of image and video databases. In this work, the parameters of the model are first used to represent texture information in image retrieval. Then we explore its application to video retrieval. Traditionally, visual information is used for video indexing and retrieval. However, in some cases audio information is more helpful for finding clues to the video events. The proposed feature extraction scheme is based on the fundamental property of the wavelet transform. Therefore, it can also be adopted to analyze the audio contents of the video data. The experimental evaluation indicates the high discriminatory power of the proposed feature set. The dimension of the extracted feature vector is low, which is important for the retrieval efficiency of the system in terms of response time. User feedback is used to enhance the retrieval performance by modifying the system parameters according to the users' behavior. A nonlinear approach for defining the similarity between the two images is also explored in this work.Index Terms-Feature extraction, image indexing and retrieval, Laplacian mixture model, video indexing and retrieval.
This paper presents a new learning algorithm for audiovisual fusion and demonstrates its application to video classification for film database. The proposed system utilized perceptual features for content characterization of movie clips. These features are extracted from different modalities and fused through a machine learning process. More specifically, in order to capture the spatio-temporal information, an adaptive video indexing is adopted to extract visual feature, and the statistical model based on Laplacian mixture are utilized to extract audio feature. These features are fused at the late fusion stage and input to a support vector machine (SVM) to learn semantic concepts from a given video database. Based on our experimental results, the proposed system implementing the SVM-based fusion technique achieves high classification accuracy when applied to a large volume database containing Hollywood movies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.