Summarization of videotaped presentations: automatic analysis of motion and gesture

Ju, Shanon X.; Black, Michael J.; Minneman, Scott; Kimber, Don

doi:10.1109/76.718513

Cited by 73 publications

(36 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, in the case of talks, the system should work on transcriptions obtained through Automatic Speech Recognition and this can further reduce the effectiveness of the method. The approach that has been preferred so far is to segment the presentations in correspondence with slide transitions [31], [20], [19], [14]. As mentioned in the previous section, this is rea-sonable because it reflects the logical organization given by the speaker to his/her talk, but it neglects the actual topics being presented.…”

Section: Previous Workmentioning

confidence: 99%

“…The main advantage provided by the system is that the students can avoid taking detailed notes during the courses and focus on main ideas and concepts presented by the teacher. An important application that has been explored is summarization [10], [11], [14]. In [10], the summary is obtained by simply eliminating silences from the audio channel.…”

Section: Previous Workmentioning

confidence: 99%

See 1 more Smart Citation

Application of Information Retrieval Technologies to Presentation Slides

Vinciarelli

Odobez

2006

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Abstract. Presentations are becoming an increasingly more common means of communication in working environments, and slides are often the necessary supporting material on which the presentations rely. In this paper, we describe a slide indexing and retrieval system in which the slides are captured as images (through a framegrabber) at the moment they are displayed during a presentation and then transcribed with an OCR system. In this context, we show that such an approach presents several advantages over the use of commercial software (API based) to obtain the slide transcriptions. We report a set of retrieval experiments conducted on a database of 26 real presentations (570 slides) collected at a workshop. The experiments show that the overall retrieval performance is close to that obtained using either a manual transcription of the slides or the API software. Moreover, the experiments show that the OCR based approach outperforms significantly the API in extracting the text embedded in images and figures.

show abstract

Section: Previous Workmentioning

confidence: 99%

Section: Previous Workmentioning

confidence: 99%

Application of Information Retrieval Technologies to Presentation Slides

Vinciarelli

Odobez

2006

IEEE Trans. Multimedia

View full text Add to dashboard Cite

show abstract

“…Earlier works were mainly based on processing only the visual input. Zhuang et al extracted salient frames based on color clustering and global motion [4], while Ju et al used gesture analysis in addition to the latter low-level features [5]. Furthermore Avrithis et al represent the video content by a highdimensional feature curve and detect key-frames as the ones that correspond to the curvature points [6].…”

Section: Introductionmentioning

confidence: 99%

An Audio-Visual Saliency Model for Movie Summarization

Rapantzikos

Evangelopoulos

Maragos

et al. 2007

2007 IEEE 9th Workshop on Multimedia Signal Processing

View full text Add to dashboard Cite

show abstract

“…Zhuang et al [41] extracted salient frames based on color clustering and global motion, while Ju et al [13] used gesture analysis in addition to the latter lowlevel features. Furthermore Avrithis et al [2] represent the video content by a high-dimensional feature curve and detect key-frames at the curvature points.…”

mentioning

confidence: 99%

Audiovisual Attention Modeling and Salient Event Detection

Evangelopoulos

Rapantzikos

Maragos

et al. 2008

Multimodal Processing and Interaction

View full text Add to dashboard Cite

Although human perception appears to be automatic and unconscious, complex sensory mechanisms exist that form the preattentive component of understanding and lead to awareness. Considerable research has been carried out into these preattentive mechanisms and computational models have been developed for similar problems in the fields of computer vision and speech analysis. The focus here is to explore aural and visual information in video streams for modeling attention and detecting salient events. The separate aural and visual modules may convey explicit, complementary or mutually exclusive information around the detected audiovisual events. Based on recent studies on perceptual and computational attention modeling, we formulate measures of attention using features of saliency for the audiovisual stream. Audio saliency is captured by signal modulations and related multifrequency band features, extracted through nonlinear operators and energy tracking. Visual saliency is measured by means of a spatiotemporal attention model driven by various feature cues (intensity, color, motion). Features from both modules mapped to one-dimensional, time-varying saliency curves, from which statistics of salient segments can be extracted and important audio or visual events can be detected through adaptive, threshold-based mechanisms. Audio and video curves are integrated in a single attention curve, where events may be enhanced, suppressed or vanished. Salient events from the audiovisual curve are detected through geometrical features such as local extrema, sharp transitions and level sets. The potential of inter-module fusion and audiovisual event detection is demonstrated in applications such as video key-frame selection, video skimming and video annotation.

show abstract

Summarization of videotaped presentations: automatic analysis of motion and gesture

Cited by 73 publications

References 30 publications

Application of Information Retrieval Technologies to Presentation Slides

Application of Information Retrieval Technologies to Presentation Slides

An Audio-Visual Saliency Model for Movie Summarization

Audiovisual Attention Modeling and Salient Event Detection

Contact Info

Product

Resources

About