2005 IEEE International Conference on Multimedia and Expo
DOI: 10.1109/icme.2005.1521695
|View full text |Cite
|
Sign up to set email alerts
|

Affective Meeting Video Analysis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 8 publications
0
10
0
Order By: Relevance
“…Based on the assertion that information retrieval in multimedia environments is actually a combination of search and browsing in most cases, a hypermedia navigation concept for lecture recordings is presented in [12]. An experiment is described in [9] where automatically-extracted audio-visual features of a video were compared to manual annotations that were created by users.…”
Section: Related Workmentioning
confidence: 99%
“…Based on the assertion that information retrieval in multimedia environments is actually a combination of search and browsing in most cases, a hypermedia navigation concept for lecture recordings is presented in [12]. An experiment is described in [9] where automatically-extracted audio-visual features of a video were compared to manual annotations that were created by users.…”
Section: Related Workmentioning
confidence: 99%
“…The training set consisted of 30 seconds of keyboard input, 30 seconds of silence, and one minute of voice (speaking on the phone). We extracted the features described in section 2.6, namely volume, mean pitch, pitch standard deviation, and pitch intensity, as implemented in [22]. We used the MAD framework in Matlab to extract pitch using autocorrelation using frames of length 1024 (32 kHz sampling rate) and one second segments.…”
Section: Experiments Fourmentioning
confidence: 99%
“…We implement an audio classifier for these three classes using simple features such as volume, mean pitch, pitch standard deviation, and pitch intensity (using the method described in [22]). Since there are pauses when a person speaks, a voice segment will often include silence gaps.…”
Section: Audio Processingmentioning
confidence: 99%
“…An experiment is described in [10], where automatically extracted audio-visual features of a video were compared to manually annotations that were created by users.…”
Section: Related Workmentioning
confidence: 99%