“…Applications of HMMs to video segmentation and indexing have been reported in recent literature [7], [8], [9]. A successful HMMbased video segmentation and indexing scheme depends greatly on the selection of a suitable multi-dimensional feature vector to represent each image frame in the video stream.…”
Section: Image Features For Semantic Unit Hmmsmentioning
confidence: 99%
“…Measures of illumination changes at both the pixel level and the histogram level are also included in the multi-dimensional feature vector. A detailed description of these features is provided in [9].…”
Section: Image Features For Semantic Unit Hmmsmentioning
confidence: 99%
“…Li et al [8] propose an HMM framework to detect play events in sports videos. Eickeler et al [9] use an HMM-based predefined program model to index news programs. In the aforementioned works, however, the system performance could be compromised due to audio-visual mismatch [7] and inaccurate domain-dependent knowledge about the video program structure [8], [9].…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by the success of modern continuous speech recognition [10], a video stream is modeled at both the semantic unit level and the program level. For each semantic unit, an HMM is generated to model the stochastic behavior of the sequence of image feature emissions [9]. At the program level, a probabilistic grammar is generated by training on video data using maximum likelihood estimation.…”
Section: Introductionmentioning
confidence: 99%
“…In contrast to existing HMM-based video segmentation and indexing techniques [9], no domain-dependent knowledge about the structure of video programs is used. This allows the proposed approach to handle a wide variety of video types without having to manually redefine the program model.…”
Semantic video indexing is the first step towards automatic video retrieval and personalization. We propose a data-driven stochastic modeling approach to perform both video segmentation and video indexing in a single pass. Compared with the existing Hidden Markov Model (HMM)-based video segmentation and indexing techniques, the advantages of the proposed approach are as follows: (1) the probabilistic grammar defining the video program is generated entirely from the training data allowing the proposed approach to handle various kinds of videos without having to manually redefine the program model; (2) the proposed use of the Tamura features improves the accuracy of temporal segmentation and indexing; (3) the need to use an HMM to model the video edit effects is obviated thus simplifying the processing and collection of training data and ensuring that all video segments in the database are labeled with concepts that have clear semantic meanings in order to facilitate semantics-based video retrieval. Experimental results on broadcast news video are presented.
“…Applications of HMMs to video segmentation and indexing have been reported in recent literature [7], [8], [9]. A successful HMMbased video segmentation and indexing scheme depends greatly on the selection of a suitable multi-dimensional feature vector to represent each image frame in the video stream.…”
Section: Image Features For Semantic Unit Hmmsmentioning
confidence: 99%
“…Measures of illumination changes at both the pixel level and the histogram level are also included in the multi-dimensional feature vector. A detailed description of these features is provided in [9].…”
Section: Image Features For Semantic Unit Hmmsmentioning
confidence: 99%
“…Li et al [8] propose an HMM framework to detect play events in sports videos. Eickeler et al [9] use an HMM-based predefined program model to index news programs. In the aforementioned works, however, the system performance could be compromised due to audio-visual mismatch [7] and inaccurate domain-dependent knowledge about the video program structure [8], [9].…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by the success of modern continuous speech recognition [10], a video stream is modeled at both the semantic unit level and the program level. For each semantic unit, an HMM is generated to model the stochastic behavior of the sequence of image feature emissions [9]. At the program level, a probabilistic grammar is generated by training on video data using maximum likelihood estimation.…”
Section: Introductionmentioning
confidence: 99%
“…In contrast to existing HMM-based video segmentation and indexing techniques [9], no domain-dependent knowledge about the structure of video programs is used. This allows the proposed approach to handle a wide variety of video types without having to manually redefine the program model.…”
Semantic video indexing is the first step towards automatic video retrieval and personalization. We propose a data-driven stochastic modeling approach to perform both video segmentation and video indexing in a single pass. Compared with the existing Hidden Markov Model (HMM)-based video segmentation and indexing techniques, the advantages of the proposed approach are as follows: (1) the probabilistic grammar defining the video program is generated entirely from the training data allowing the proposed approach to handle various kinds of videos without having to manually redefine the program model; (2) the proposed use of the Tamura features improves the accuracy of temporal segmentation and indexing; (3) the need to use an HMM to model the video edit effects is obviated thus simplifying the processing and collection of training data and ensuring that all video segments in the database are labeled with concepts that have clear semantic meanings in order to facilitate semantics-based video retrieval. Experimental results on broadcast news video are presented.
With the emergence of Web 2.0, sharing personal content, communicating ideas, and interacting with other online users in Web 2.0 communities have become daily routines for online users. User-generated data from Web 2.0 sites provide rich personal information (e.g., personal preferences and interests) and can be utilized to obtain insight about cyber communities and their social networks. Many studies have focused on leveraging usergenerated information to analyze blogs and forums, but few studies have applied this approach to video-sharing Web sites. In this study, we propose a text-based framework for video content classification of online-video sharing Web sites. Different types of user-generated data (e.g., titles, descriptions, and comments) were used as proxies for online videos, and three types of text features (lexical, syntactic, and content-specific features) were extracted. Three feature-based classification techniques (C4.5, Naïve Bayes, and Support Vector Machine) were used to classify videos. To evaluate the proposed framework, user-generated data from candidate videos, which were identified by searching user-given keywords onYouTube, were first collected.Then, a subset of the collected data was randomly selected and manually tagged by users as our experiment data.The experimental results showed that the proposed approach was able to classify online videos based on users' interests with accuracy rates up to 87.2%, and all three types of text features contributed to discriminating videos. Support Vector Machine outperformed C4.5 and Naïve Bayes techniques in our experiments. In addition, our case study further demonstrated that accurate video-classification results are very useful for identifying implicit cyber communities on video-sharing Web sites.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.