1998
DOI: 10.1109/76.718513
|View full text |Cite
|
Sign up to set email alerts
|

Summarization of videotaped presentations: automatic analysis of motion and gesture

Abstract: Abstract-This paper presents an automatic system for analyzing and annotating video sequences of technical talks. Our method uses a robust motion estimation technique to detect key frames and segment the video sequence into subsequences containing a single overhead slide. The subsequences are stabilized to remove motion that occurs when the speaker adjusts their slides. Any changes remaining between frames in the stabilized sequences may be due to speaker gestures such as pointing or writing, and we use active… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
35
0

Year Published

2003
2003
2008
2008

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 73 publications
(36 citation statements)
references
References 30 publications
0
35
0
Order By: Relevance
“…Moreover, in the case of talks, the system should work on transcriptions obtained through Automatic Speech Recognition and this can further reduce the effectiveness of the method. The approach that has been preferred so far is to segment the presentations in correspondence with slide transitions [31], [20], [19], [14]. As mentioned in the previous section, this is rea-sonable because it reflects the logical organization given by the speaker to his/her talk, but it neglects the actual topics being presented.…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, in the case of talks, the system should work on transcriptions obtained through Automatic Speech Recognition and this can further reduce the effectiveness of the method. The approach that has been preferred so far is to segment the presentations in correspondence with slide transitions [31], [20], [19], [14]. As mentioned in the previous section, this is rea-sonable because it reflects the logical organization given by the speaker to his/her talk, but it neglects the actual topics being presented.…”
Section: Previous Workmentioning
confidence: 99%
“…The main advantage provided by the system is that the students can avoid taking detailed notes during the courses and focus on main ideas and concepts presented by the teacher. An important application that has been explored is summarization [10], [11], [14]. In [10], the summary is obtained by simply eliminating silences from the audio channel.…”
Section: Previous Workmentioning
confidence: 99%
“…Earlier works were mainly based on processing only the visual input. Zhuang et al extracted salient frames based on color clustering and global motion [4], while Ju et al used gesture analysis in addition to the latter low-level features [5]. Furthermore Avrithis et al represent the video content by a highdimensional feature curve and detect key-frames as the ones that correspond to the curvature points [6].…”
Section: Introductionmentioning
confidence: 99%
“…Zhuang et al [41] extracted salient frames based on color clustering and global motion, while Ju et al [13] used gesture analysis in addition to the latter lowlevel features. Furthermore Avrithis et al [2] represent the video content by a high-dimensional feature curve and detect key-frames at the curvature points.…”
mentioning
confidence: 99%