Video genre verification using both acoustic and visual modes

Roach, Matthew; Mason, John S.; Xu, Li-Qun

doi:10.1109/mmsp.2002.1203271

Cited by 18 publications

(9 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Roach et al [42] extend their earlier work, classifying video using the audio features described in Roach and Mason [41] as well as visual features obtained in a manner similar to that described in Roach et al [75]. A GMM is used for classification of a linear combination of the conditional probabilities of the audio and visual features.…”

Section: Combination Approachesmentioning

confidence: 90%

“…Roach et al [42] detect the motion of foreground objects using a frame-differencing approach. Pixel-wise frame differencing of consecutive frames is performed using the Euclidean distance between pixels in the RGB color space.…”

Section: ) Motion-based Featuresmentioning

confidence: 99%

“…Truong et al [60] choose features they believe correspond to how humans identify genre. The features they use are average shot length, percentage of each type of shot transition (cut, fade, dissolve), camera movement, pixel luminance variance, rate of static scenes (i.e., little camera or object motion), [56] X X X Dimitrova et al [66] X X Truong et al [60] X X X X Kobla et al [7] X X Roach et al [75] X Roach et al [76] X X Pan and Faloutsos [77] X Lu et al [64] X Jadon et al [63] X X X Hauptmann et al [2] X X X Pan and Faloutsos [39] X Rasheed et al [62] X X Gibert et al [78] X X Yuan et al [65] X X X X Hong et al [79] X X X Brezeale and Cook [18] X Fischer et al [35] X X X X X Nam et al [4] X X X Huang et al [36] X X Qi et al [21] X Jasinschi and Louie [19] X X X X X Roach et al [42] X Rasheed and Shah [40] X X X Lin and Hauptmann [20] X Lee et al [37] X X Wang et al [13] X X X X Xu and Li [43] X X X Fan et al [8] X X X length of motion runs, standard deviation of a frame luminance histogram, percentage of pixels having brightness above some threshold, and percentage of pixels having saturation above some threshold. Classification is performed using the C4.5 decision tree to classify video into one of five classes: cartoon, commercial, music, news, or sports.…”

Section: B Video Classification Using Visual Features Onlymentioning

confidence: 99%

See 2 more Smart Citations

Automatic Video Classification: A Survey of the Literature

Brezeale

Cook

2008

IEEE Trans. Syst., Man, Cybern. C

242

148

View full text Add to dashboard Cite

Abstract-There is much video available today. To help viewers find video of interest, work has begun on methods of automatic video classification. In this paper, we survey the video classification literature. We find that features are drawn from three modalities-text, audio, and visual-and that a large variety of combinations of features and classification have been explored. We describe the general features chosen and summarize the research in this area. We conclude with ideas for further research.

show abstract

Section: Combination Approachesmentioning

confidence: 90%

Section: ) Motion-based Featuresmentioning

confidence: 99%

Section: B Video Classification Using Visual Features Onlymentioning

confidence: 99%

See 1 more Smart Citation

Automatic Video Classification: A Survey of the Literature

Brezeale

Cook

2008

IEEE Trans. Syst., Man, Cybern. C

242

148

View full text Add to dashboard Cite

show abstract

“…One approach is to use the output of different Hidden Markov Models as the input of a multi-layer perceptron Neural Network (Huang J. et al, 1999). Another approach makes use of a Gaussian mixture model to classify a linear combination of the conditional probabilities of audio and visual features (Roach et al, 2002). A simpler idea is to concatenate different features into a single vector that will be used to train an SVM, as for example described in (Lin and Hauptmann, 2002).…”

Section: Related Workmentioning

confidence: 99%

SVM-based Video Segmentation and Annotation of Lectures and Conferences

Masneri

Schreer

2014

Proceedings of the 9th International Conference on Computer Vision Theory and Applications

View full text Add to dashboard Cite

Abstract:This paper presents a classification system for video lectures and conferences based on Support Vector Machines (SVM). The aim is to classify videos into four different classes (talk, presentation, blackboard, mix). On top of this, the system further analyses presentation segments to detect slide transitions, animations and dynamic content such as video inside the presentation. The developed approach uses various colour and facial features from two different datasets of several hundred hours of video to train an SVM classifier. The system performs the classification on frame-by-frame basis and does not require precomputed shotcut information. To avoid over-segmentation and to take advantage of the temporal correlation of succeeding frames, the results are merged every 50 frames into a single class. The presented results prove the robustness and accuracy of the algorithm. Given the generality of the approach, the system can be easily adapted to other lecture datasets.

show abstract

“…Roach et al [RMX02] detect the motion of foreground objects using a framedifferencing approach. Pixel-wise frame differencing of consecutive frames is performed using the Euclidean distance between pixels in the RGB color space.…”

Section: Motion-based Featuresmentioning

confidence: 99%