Affective Meeting Video Analysis

Jaimes, Alejandro; Nagamine, Takahiko; Liu, J.; Omura, Kengo; Sebe, Nicu

doi:10.1109/icme.2005.1521695

Cited by 16 publications

(10 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on the assertion that information retrieval in multimedia environments is actually a combination of search and browsing in most cases, a hypermedia navigation concept for lecture recordings is presented in [12]. An experiment is described in [9] where automatically-extracted audio-visual features of a video were compared to manual annotations that were created by users.…”

Section: Related Workmentioning

confidence: 99%

Automatic Extraction of Semantic Descriptions from the Lecturer's Speech

Repp

Meinel

2009

2009 IEEE International Conference on Semantic Computing

View full text Add to dashboard Cite

The number of digital lecture video recordings has increased dramatically since recording technology became easier to use. The accessibility and ability to search within this large archive are limited and difficult. Manual annotation is time-consuming and therefore useless. A promising approach is based on using the audio layer of a lecture recording to obtain semantic information about the lecture's contents. The speech transcript and the words from the power point slides are sufficient to generate semantic metadata serialized in an OWL file. Two annotation methods are discussed, evaluated and compared to each other and to a perfectly annotated OWL file, as well as to an annotation based on a corrected transcript of the lecture.

show abstract

Section: Related Workmentioning

confidence: 99%

Automatic Extraction of Semantic Descriptions from the Lecturer's Speech

Repp

Meinel

2009

2009 IEEE International Conference on Semantic Computing

View full text Add to dashboard Cite

show abstract

“…The training set consisted of 30 seconds of keyboard input, 30 seconds of silence, and one minute of voice (speaking on the phone). We extracted the features described in section 2.6, namely volume, mean pitch, pitch standard deviation, and pitch intensity, as implemented in [22]. We used the MAD framework in Matlab to extract pitch using autocorrelation using frames of length 1024 (32 kHz sampling rate) and one second segments.…”

Section: Experiments Fourmentioning

confidence: 99%

“…We implement an audio classifier for these three classes using simple features such as volume, mean pitch, pitch standard deviation, and pitch intensity (using the method described in [22]). Since there are pauses when a person speaks, a voice segment will often include silence gaps.…”

Section: Audio Processingmentioning

confidence: 99%

Posture and activity silhouettes for self-reporting, interruption management, and attentive interfaces

Jaimes

2006

Proceedings of the 11th International Conference on Intelligent User Interfaces

View full text Add to dashboard Cite

In this paper we present a novel system for monitoring a computer user's posture and activities in front of the computer (e.g., reading, speaking on the phone, etc.) for self-reporting. In our system, a camera and a microphone are placed in front of a computer work area (e.g., on top of the computer screen). The system can be used as a component in an attentive interface, or for giving the user real time feedback on the goodness of his current posture, and generating summaries of postures and activities over a specified period of time (e.g., hours, days, months, etc.). All elements of the system are highly customizable: the user decides what "good" postures are, what alarms and interruptions are triggered, if any, and what activity and posture summaries are generated. We present novel algorithms for posture measurement (using geometric features of the user's silhouette), and activity classification (using machine learning). Finally, we present experiments that show the feasibility of our approach.

show abstract

“…An experiment is described in [10], where automatically extracted audio-visual features of a video were compared to manually annotations that were created by users.…”

Section: Related Workmentioning

confidence: 99%

Towards to an automatic semantic annotation for multimedia learning objects

Repp

Linckels

Meinel

2007

Proceedings of the International Workshop on Educational Multimedia and Multimedia Education

View full text Add to dashboard Cite

The number of digital video recordings has increased dramatically. The idea of recording lectures, speeches, and other academic events is not new. But, the accessibility and traceability of its content for further use is rather limited. Searching multimedia data, in particular audiovisual data, is still a challenging task to overcome. We describe and evaluate a new approach to generate a semantic annotation for multimedia resources, i.e., recorded university lectures. Speech recognition is applied to create a tentative and deficient transliteration of the video recordings. We show that the imperfect transliteration is sufficient to generate semantic metadata serialized in an OWL file. The semantic annotation process based on textual material and deficient transliterations of lecture recordings are discussed and evaluated.

show abstract

Affective Meeting Video Analysis

Cited by 16 publications

References 8 publications

Automatic Extraction of Semantic Descriptions from the Lecturer's Speech

Automatic Extraction of Semantic Descriptions from the Lecturer's Speech

Posture and activity silhouettes for self-reporting, interruption management, and attentive interfaces

Towards to an automatic semantic annotation for multimedia learning objects

Contact Info

Product

Resources

About