Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Evangelopoulos, Georgios; Zlatintsi, Athanasia; Potamianos, Alexandros; Maragos, Petros; Rapantzikos, Konstantinos; Σκούμας, Γεώργιος; Avrithis, Yannis

doi:10.1109/tmm.2013.2267205

Cited by 220 publications

(170 citation statements)

References 66 publications

Supporting

Mentioning

152

Contrasting

Unclassified

Order By: Relevance

“…All subjects, having recently watched the 3 movies, were independently shown the skims in a consecutive manner and in random order. As in [20], the scale was graded the following way: poor (0% -40%), fair (40% -60%), good (60% -75%), very good (75% -90%) and excellent (90% -100%).…”

Section: Discussionmentioning

confidence: 99%

Movie shot selection preserving narrative properties

Mademlis

Tefas

Nikolaidis

et al. 2016

2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Movie shot selection preserving narrative properties

Mademlis

Tefas

Nikolaidis

et al. 2016

2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)

View full text Add to dashboard Cite

“…Energy Mean and MFCCs (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12) standard LPCs (0-13) deviation ZCR Spectral Flux Spectral Rolloff Chroma Vector (0-11) Clarity Table 1. Extracted audio features…”

Section: Low Level Descriptor Functionalsmentioning

confidence: 99%

“…Emotion recognition is an active research area with many applications such as human assistive systems [1], autonomous video summarisation [2], diagnosing patients mental illness, monitoring the drivers emotion variations to avoid accidents and helping the manmachine interactions [3]. Emotion recognition systems can also find applications in key event detection tasks [2], affective analysis in music [4] or dialogue management [5].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Effective emotion recognition in movie audio tracks

Kotti

Stylianou

2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This paper addresses the problem of speech emotion recognition from movie audio tracks. The recently collected Acted Facial Expression in the Wild 5.0 database is used. The aim is to discriminate among angry, happy, and neutral. We extract a relatively small number of features, a subset of which is not commonly used for the emotion recognition task. Those features are fed as input to an ensemble classifier that combines random forests with support vector machines. An accuracy of 65.63% is reported, outperforming a baseline system that uses the K-nearest neighbor classifier and has an accuracy of 56.88%. To verify the suitability of the exploited features, the same ensemble classification schema is applied on the feature set similar those employed in Audio/Visual Emotion Challenge 2011. In the latter case, an accuracy of 61.25% is achieved using a large set of 1582 features, as opposed to just 86 features in our case that lead to a relative improvement of 7.15% in accuracy.

show abstract

“…The process of key frame extraction is also known as "key-framing", "story-boarding" or "static video summarization". A video skim is a video of shorter length than the input stream, which is known as "dynamic video summarization" [1], [5], [6], [7], [8], [9].…”

Section: Introductionmentioning

confidence: 99%